Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When we receive an UPDATE with MP_NEXTHOP len as 32 bytes, we shouldn't
check if the global (1st) nexthop is unspecified.
Peering between bird and FRRouting we receive from Bird something like:
```
rcvd UPDATE w/ attr: , origin i, mp_nexthop ::(fe80::a00:27ff:fe09:f8a3)
```
The link-local (2nd) nexthop is valid and validated later in the code.
Before it was marked:
```
IPv6 unicast -- DENIED due to: martian or self next-hop;
```
After it's a valid prefix:
```
spine1-debian-9# show bgp
BGP table version is 0, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
2a02:4780::/64 fe80::a00:27ff:fe09:f8a3
0 65001 i
Displayed 1 routes and 1 total paths
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Replace all `random()` calls with a function called `frr_weak_random()`
and make it clear that it is only supposed to be used for weak random
applications.
Use the annotation described by the Coverity Scan documentation to
ignore `random()` call warnings.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
In real world sometimes happens that bgp_nexthop_cache is NULL. Avoid
segfaulting when using `show [ip] bgp ...` CLI commands.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Rather than doing a f*gly hack for the RPKI code, let's do an on-exit
hook in cmd_node. Also allows replacing some special-casing in the vty
code.
Signed-off-by: David Lamparter <equinox@diac24.net>
And again for the name. Why on earth would we centralize this, just so
people can forget to update it?
Signed-off-by: David Lamparter <equinox@diac24.net>
Same as before, instead of shoving this into a big central list we can
just put the parent node in cmd_node.
Signed-off-by: David Lamparter <equinox@diac24.net>
There is really no reason to not put this in the cmd_node.
And while we're add it, rename from pointless ".func" to ".config_write".
[v2: fix forgotten ldpd config_write]
Signed-off-by: David Lamparter <equinox@diac24.net>
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>
The problem is when using kinda such topologies:
(192.168.1.1/32) r1 <-- eBGP --> r2 <-- iBGP --> r3
Looking at r3's nexthop for 192.168.1.1/32 we have it as r2, but really
it MUST be r1.
Checking if the nexthop is connected solves the problem even for cases
when route-reflectors are used.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
This fixes unnecessary whitespaces and makes capitalization
match for route type help strings.
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
Some competitive vendors like Cisco, Bird, OpenBGPD,
Nokia already have this by default enabled.
The list is here: https://github.com/bgp/RFC8212
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Problem Description:
=====================
+--+ +--+
|R1|-(192.201.202.1)----iBGP----(192.201.202.2)-|R2|
+--+ +--+
Routes on R2:
=============
S>* 202.202.202.202/32 [1/0] via 192.201.78.1, ens256, 00:40:48
Where, the next-hop network, 192.201.78.0/24, is a directly connected network address.
C>* 192.201.78.0/24 is directly connected, ens256, 00:40:48
Configurations on R1:
=====================
!
router bgp 201
bgp router-id 192.168.0.1
neighbor 192.201.202.2 remote-as 201
!
Configurations on R2:
=====================
!
ip route 202.202.202.202/32 192.201.78.1
!
router bgp 201
bgp router-id 192.168.0.2
neighbor 192.201.202.1 remote-as 201
!
address-family ipv4 unicast
redistribute static
exit-address-family
!
Step-1:
=======
R1 receives the route 202.202.202.202/32 from R2.
R1 installs the route in its BGP RIB.
Step-2:
=======
On R1, a connected interface address is added.
The address is the same as the next-hop of the BGP route received from R2 (192.201.78.1).
Point of Failure:
=================
R1 resolves the BGP route even though the route's next-hop is its own connected address.
Even though this appears to be a misconfiguration it would still be better to safeguard the code against it.
Fix:
====
When BGP receives a connected route from Zebra, it processes the
routes for the next-hop update.
While doing so, BGP must ignore routes whose next-hop address matches
the address of the connected route for which Zebra sent the next-hop update
message.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
Ensure that upon a link-bandwidth change - for e.g., due to change in
the number of multipaths - EVPN type-5 route injection is triggered.
In the absence of this, the proper link-bandwidth is not updated in
EVPN type-5 routes originated by the router.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
take into account polychaeta tips ono code style.
also, take into account miscellaneous code style recommandations like
braces usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Multiple different issues causing mostly UAFs but maybe other more
subtle things.
- Cluster lists were the only attributes whose pointers were not being
NULL'd when freed, resulting in heap UAF
- When performing an insert into the cluster hash, our temporary struct
used for hash_get() was inconsistent with our hash keying and
comparison functions. In the case of a zero length cluster list, the
->length field is 0 and the ->list field is NULL. When performing an
insert, we set the ->list field regardless of whether the length is 0.
This resulted in the two cluster lists hashing equal but not comparing
equal. Later, when removing one of them from the hash before freeing
it, because the key matched and the comparison succeeded (because it
was set to NULL *after* the search but *before* inserting into the
hash) we would sometimes release the duplicated copy of the struct,
and then free the one that remained in the hash table. Later accesses
constitute UAF. This is fixed by making sure the fields used for the
existence check match what is actually inserted into the hash when
that check fails.
This patch also makes cluster_unintern static, because it should be.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
... Oops ...
(for context, the defaults code originally didn't have a dedicated
"bool" variant and just used long for bools... I derp'd this when
adding bool as a separate case :( )
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@diac24.net>
This macro is undefined if vnc is disabled, and while it defaults to 0,
this is still wrong and causes issues with -Werror
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This is a full rewrite of the "back end" logging code. It now uses a
lock-free list to iterate over logging targets, and the targets
themselves are as lock-free as possible. (syslog() may have a hidden
internal mutex in the C library; the file/fd targets use a single
write() call which should ensure atomicity kernel-side.)
Note that some functionality is lost in this patch:
- Solaris printstack() backtraces are ditched (unlikely to come back)
- the `log-filter` machinery is gone (re-added in followup commit)
- `terminal monitor` is temporarily stubbed out. The old code had a
race condition with VTYs going away. It'll likely come back rewritten
and with vtysh support.
- The `zebra_ext_log` hook is gone. Instead, it's now much easier to
add a "proper" logging target.
v2: TLS buffer to get some actual performance
Signed-off-by: David Lamparter <equinox@diac24.net>
- each statistics is encapsulated into concatenated "<afi><safi>" value.
- the json encoding for floating and double values is using json api
double api. this change is done for bgp statistics.
- the lines over 80 characters have been handled.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this command is a shortcut to facilitate the extraction of statistics
for all afi/safi related to one bgp instance.
the command is: show bgp [vrf XX] statistics-all [json]
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
safis that use a route distinguisher in bgp tables, and as such
introduce a two level hierarchy on the bgp table, must be made available
to statistics too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
add json support for show bgp statistics command.
The title of the stats entry is aggregated without spaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The BGP Router MAC extended community should be unique and not occur
multiple times. In a VRF-to-VRF route-leak scenario where EVPN routes
from a source VRF are leaked into the target VRF and then injected
back into EVPN from the target VRF, the resulting route had more than
one RMAC. With this fix, the resulting route will have only the
target VRF's RMAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The EVPN advertise route-map may generate extended communities for an IPv4
or IPv6 route injected into EVPN as type-5. If so, allow for it and add
to it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Support configurable options to control how link bandwidth is handled
by the receiver. The default behavior is to automatically honor the
link bandwidths received and use it to perform a weighted ECMP BUT only
if all paths in the multipath have associated link bandwidth; if one or
more paths do not have link bandwidth, normal ECMP is performed among
the multipaths. This behavior is as recommended by
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth.
The additional options available are to (a) completely ignore any link
bandwidth (i.e., weighted ECMP is effectively disabled), (b) skip paths
in the multipath which do not have link bandwidth and perform weighted
ECMP among the other paths (if at least some paths have the bandwidth)
or (c) use a default weight (value chosen is 1) for the paths which
do not have link bandwidth.
The command syntax is
bgp bestpath bandwidth <ignore|skip-missing|default-weight-for-missing>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When announcing ourselves as the next hop (e.g., to EBGP peers), if the
best path has the link bandwidth extended community and it is transitive,
change the value of the link bandwidth to the cumulative downstream
bandwidth (sum of the link bandwidths of all our multipaths) as this
makes the most sense. It is also implied by
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz. Of course, do
not override the link bandwidth if it has been specified by policy.
Note: Transitive extended communities will be automatically passed along
to EBGP peers; this commit is updating the value that is announced to
something that is the most appropriate.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Implement the code to handle the other route-map options to generate
the link bandwidth, namely, to use the cumulative bandwidth or to
base this on the number of multipaths. In the latter case, a reference
bandwidth is internally chosen - the implementation uses a value of
1 Gbps.
These additional options mean that the prefix may need to be advertised
if there is a link bandwidth change, which is a new criteria. Define a
new path (change) flag to support this and implement the advertisement.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
The BGP link bandwidth extended community must not be repeated. If the
attribute already carries this and the route-map specifies a new value,
the implementation will honor the policy configuration and overwrite
the existing values.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Certain extended communities cannot be repeated. An example is the
BGP link bandwidth extended community. Enhance the extended community
add function to ensure uniqueness, if requested.
Note: This commit does not change the lack of uniqueness for any of
the already-supported extended communities. Many of them such as the
BGP route target can obviously be present multiple times. Others like
the Router's MAC should most probably be present only once. The portions
of the code which add these may already be structured such that duplicates
do not arise.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Perform weighted ECMP if the multipaths have link bandwidth. This involves
assigning weights to each of the next hops associated with the prefix based
on the link bandwidth of the corresponding path as a factor of the total
(cumulative) link bandwidth for the prefix. The weight values used are
between 1 and 100. Weights are assigned only if all paths in the multipath
have link bandwidth, otherwise any bandwidths are ignored and regular
ECMP is performed. This is as recommended in
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth
A subsequent commit will implement additional (user-configurable) behaviors.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
During multipath update, track the cumulative link bandwidth
as well as update flags appropriately.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Introduce fields in the multipath structure for link bandwidth handling.
In the process, the mp_count field is changed to a uint16 as that is the
value set anyway.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Additional extended community definitions and display of link-bandwidth
extended community.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Implement route-map option to set the link-bandwidth extended
community. The command is of the form:
set extcommunity bandwidth <(1-26214400)|cumulative|num-multipaths>
[non-transitive]
The options available are to specify the actual bandwidth value in
Mbps, base it on the cumulative downstream bandwidth or base it on
the number of multipaths. The last option is based on
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz. Further,
in alignment with the use case described in this IETF draft, the
extended community is encoded as transitive by default. There is an
option available to specify that it should be non-transitive.
The link-bandwidth itself is carried in bytes per second as specifed in
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth
Note: This commit only handles the processing for bandwidth specifed
as a value; subsequent commits will handle the processing of the other
options.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
In the past, we always displayed the number of buffered ingress packets
as zero because there was no packet buffering in the input path and
therefore never any queue size to report. They're buffered now so we can
display something meaningful instead of 0.
Also change the inq / outq lookups to be atomic, since they can be
modified elsewhere. These should still compile down to an unfenced word
read but it's good to be explicit.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
https://lists.frrouting.org/pipermail/frog/2020-March/000776.html
It was pointed out that we are not properly passing the nexthop
through and instead we were replacing the nexthop as a Route Server
with our own.
https://tools.ietf.org/html/rfc4456#section-4
10. Implementation Considerations
Care should be taken to make sure that none of the BGP path
attributes defined above can be modified through configuration when
exchanging internal routing information between RRs and Clients and
Non-Clients. Their modification could potentially result in routing
loops.
In addition, when a RR reflects a route, it SHOULD NOT modify the
following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED.
Their modification could potentially result in routing loops.
Modify the code such that when FRR is instructed to act as a
Route-Server to pass through the nexthop.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add new function `bgp_node_get_prefix()` and modify
the bgp code base to use it.
This is prep work for the struct bgp_dest rework.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Show if this malformed under `show [ip] bgp <prefix>`:
```
eva# sh ip bgp 103.79.124.0/22
BGP routing table entry for 103.79.124.0/22
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.201.136
64539 15096 6939 7545 7545 136001, (aggregated by 0(malformed) 0.0.0.0)
192.168.201.136 from 192.168.201.136 (192.168.201.136)
Origin IGP, valid, external, best (First path received)
Last update: Thu Mar 26 10:02:07 2020
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Having a full feed this leads to unknown. You can't point which prefix or
aspath has this malforming behavior.
Printing just `[EC 33554434] AGGREGATOR attribute is BGP_AS_ZERO(0)` isn't
enough, you can't directly pin-point where is the problem.
Additionally print at least aspath here:
```
[EC 33554434] AGGREGATOR AS number is 0 for aspath: 65000 65031
```
Overall the full table has only 6 such malformed prefixes:
```
aspath: 64539 15096 6939 45430 45458
aspath: 64539 15096 6939 1299 3257 34984 34984 34984 34984 34984 51174
aspath: 64539 15096 6939 286 34984 16135 16135 {16135}
aspath: 64539 15096 6939 7545 7545 136001
aspath: 64539 15096 6939 6762 3269 20746
aspath: 64539 15096 6939 7018 3379
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Line break at the end of the message is implicit for zlog_* and flog_*,
don't put it in the string. Mid-message line breaks are currently
unsupported. (LF is "end of message" in syslog.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Problem seen that if "import vrf route-map RMAP" was entered
without any vrfs being imported, the configuration was displayed
as "route-map vpn import RMAP". Additionally, if "import vrf
route-map" was entered without specifying a route-map name,
the command was accepted and the word "route-map" would be
treated as a vrf name. This fix resolves both of those issues
and also allows deleting the "import vrf route-map" line without
providing the route-map name.
Ticket: CM-28821
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Modify code to use lookup function agg_node_get_prefix()
as the abstraction layer. When we rework bgp_node to
bgp_dest this will allow us to greatly limit the amount
of work needed to do that.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Future work needs the ability to specify a
const struct prefix value. Iterate into
bgp a bit to get this started.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Defer the grabbing of the prefix for as long as is possible.
This is a long term rework of how we access the `struct bgp_node`
to only use accessor functions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
More second order effects of cleaning up rn usage
in bgp. Sprinkle the fairy const's all over the place.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify more code to use `const struct prefix` throughout
bgp. This is all prep work for adding an accessor function
for bgp_node to get the prefix and reduce all the places that
code needs to be touched when we get that work done.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Tell the compiler that the prefix is being used for lookups
and it will never change.
Setup for future work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some were converted to bool, where true/false status is needed.
Converted to void only those, where the return status was only false or true.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Convert some status defines for the fsm to an enum
so that we cannot mix and match them in the future.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In PR #6052 which fixes issue #5963 the bgp fsm events
were confused with the bgp fsm status leading
to a bug. Let's start separating those out
so these types of failures cannot just
easily occur.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the EVPN advertise route-map is applied on a copy of the
original path_info and associated attribute, so that if the route-map
has SET clauses, they can operate properly. This closely follows
the model already in use in other route-map application code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
During VRF-to-VRF route leaking, strip any extraneous route targets. This
ensures that source-VRF-specific route targets or route targets that are
internally assigned for the VRF-to-VRF route leaking don't get attached
to the route in the target VRF.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Extended communities like the BGP Route Target can be present multiple
times in a route's path attribute. Ensure that the strip function for a
particular extended community (type and subtype) handles this and
strips all occurrences.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
A BGP update-group is dynamically created to group together a set of peers
such that any BGP updates can be formed just once for the entire group and
only the next hop attribute may need to be modified when the update is sent
out to each peer in the group. The update formation code attempts to
determine as much as possible if the next hop will be set to our own IP
address for every peer in the group. This helps to avoid additional checks
at the point of sending the update (which happens on a per-peer basis) and
also because some other attributes may/could vary depending on whether the
next hop is set to our own IP or not. Resetting the next hop to our own IP
address is the most common behavior for EBGP peerings in the absence of
other user-configured or internal (e.g., for l2vpn/evpn) settings and
peerings on a shared subnet.
The code had a flaw in the multiaccess check to see if there are peers in
the update group which are on a shared subnet as the next hop of the path
being announced - the source peer could itself be in the same update group
and cause the check to give an incorrect result. Modify the check to skip
the source peer so that the check is more accurate.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
this command is missing, compared with 'match ipv6 next-hop' command
available. Adding it by taking into account the backward compatible
effect when supposing that some people have configured acls with name
being an ipv4 address.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The bgp reason code was being reset in bgp_best_selection
by rerunning bgp_path_info_cmp multiple times under certain
receiving patterns of data from peers.
This is the debugs that show this issue:
2020/03/16 19:17:22.523780 BGP: 2001:20:1:1::6 rcvd UPDATE w/ attr: nexthop 20.1.1.6, origin i, metric 600, community 1000:1006, path 20
2020/03/16 19:17:22.523819 BGP: 2001:20:1:1::6 rcvd 20.10.0.6/32 IPv4 unicast
2020/03/16 19:17:22.556168 BGP: 20.1.1.6 rcvd UPDATE w/ attr: nexthop 20.1.1.6, origin i, metric 500, community 1000:1006, path 20
2020/03/16 19:17:22.556209 BGP: 20.1.1.6 rcvd 20.10.0.6/32 IPv4 unicast
2020/03/16 19:17:22.572358 BGP: bgp_process_main_one: p=20.10.0.6/32 afi=IPv4, safi=unicast start
2020/03/16 19:17:22.572408 BGP: 20.10.0.6/32: Comparing path 2001:20:1:1::6 flags 0x410 with path 20.1.1.6 flags 0x410
2020/03/16 19:17:22.572415 BGP: 20.10.0.6/32: path 2001:20:1:1::6 loses to path 20.1.1.6 due to MED 600 > 500
2020/03/16 19:17:22.572422 BGP: 20.10.0.6/32: path 20.1.1.6 is the bestpath from AS 20
2020/03/16 19:17:22.572429 BGP: 20.10.0.6/32: path 20.1.1.6 is the initial bestpath
2020/03/16 19:17:22.572435 BGP: bgp_best_selection: pi 0x5627187c66c0 dmed
2020/03/16 19:17:22.572441 BGP: 20.10.0.6/32: After path selection, newbest is path 20.1.1.6 oldbest was NONE
2020/03/16 19:17:22.572447 BGP: 20.10.0.6/32: path 20.1.1.6 is the bestpath, add to the multipath list
2020/03/16 19:17:22.572453 BGP: 20.10.0.6/32: path 2001:20:1:1::6 has the same nexthop as the bestpath, skip it
2020/03/16 19:17:22.572460 BGP: 20.10.0.6/32: starting mpath update, newbest 20.1.1.6 num candidates 1 old-mpath-count 0 old-cum-bw u0
2020/03/16 19:17:22.572466 BGP: 20.10.0.6/32: comparing candidate 20.1.1.6 with existing mpath NONE
2020/03/16 19:17:22.572473 BGP: 20.10.0.6/32: New mpath count (incl newbest) 1 mpath-change NO all_paths_lb 0 cum_bw u0
Effectively if BGP receives 2 paths it could end up running bgp_path_info_cmp multiple times
and in some situations overwrite the reason selected the first time through.
In this example path selection is run and the MED is the reason for the choice.
Then in bgp_best_selection is run again this time clearing new_select
to NULL before calling path selection for the first time. This second
call into path selection resets the reason, since it is only passing in one
path. So save the last reason selected and restore in this case.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This scenario has been seen against microtik virtual machine with
bfd enabled. When remote microtik bgp reestablishes the bgp session
after a bgp reset, the bgp establishment comes first, then bfd is
initialising.
The second point is true for microtik, but not for frrouting, as the
frrouting, when receiving bfd down messages, is not at init state.
Actually, bfd state is up, and sees the first bfd down packet from
bfd as an issue. Consequently, the BGP session is cleared.
The fix consists in resetting the BFD session, only if bfd status is
considered as up, once BGP comes up.
That permits to align state machines of both local and remote bfd.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When bgp is updated with local source, the bgp session is reset; bfd
also must be reset. The bgp_stop() handler handles all kind of
unexpected failures, so the placeholder to deregister from bfd should be
ok, providing that when bgp establishes, a similar function in bgp will
recreate bfd context.
Note that the bfd session is not reset on one specific case, where BFD
down event is the last reset. In that case, we must let BFD to monitor
the link.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If the default BGP instance is importing routes from another instance and
the latter has a router-id update, the update handler needs to handle the
default instance in a special way.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-26007
Reviewed By: CCR-9108
Testing Done: Detailed verification in 3.x
Ensure that the late registration for NHT done for IPv4 route exchange
over IPv6 GUA peering is not attempted for peer-groups, only for peers.
Fixes: "bgpd: Late registration of Extended Nexthop"
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
This scenario has been seen against microtik virtual machine with bfd
enabled. When remote microtik bgp reestablishes the bgp session after a
bgp reset, the bgp establishment comes first, then bfd is initialising.
The second point is true for microtik, but not for frrouting, as the
frrouting, when receiving bfd down messages, is not at init state.
Actually, bfd state is up, and sees the first bfd down packet from bfd
as an issue. Consequently, the BGP session is cleared.
The fix consists in resetting the BFD session, once BGP comes up. That
permits to align state machines of both local and remote bfd.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When bgp is updated with local source, the bgp session is reset; bfd
also must be reset. The bgp_stop() handler handles all kind of
unexpected failures, so the placeholder to deregister from bfd should be
ok, providing that when bgp establishes, a similar function in bgp will
recreate bfd context.
Note that the bfd session is not reset on one specific case, where BFD
down event is the last reset. In that case, we must let BFD to monitor
the link.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This fixes a linking issue on Fedora Rawhide:
/usr/bin/ld: bgpd/libbgp.a(bgp_flowspec.o):/home/ruben/src/frr/./bgpd/bgp_attr_evpn.h:37: multiple definition of `eth_tag_id'; bgpd/bgp_btoa-bgp_btoa.o:/home/ruben/src/frr/./bgpd/bgp_attr_evpn.h:37: first defined here
collect2: error: ld returned 1 exit status
Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
We were using XMALLOC for these, and only initializing the refcount to 0
on one of them. Let's just use XCALLOC instead...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
RCA: When doppelganger still around and clear bgp is issued
there are chances of peer getting deleted and next pointer
is a freed peer pointer.
Fix: Pass address of nnode to get next safe peer pointer.
Signed-off-by: Santosh P K <sapk@vmware.com>
It's been a year search and destroy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We already have a generic support for add/sub in route-maps. It's already
handled in route_value_compile().
Just convert to string (allow passing (-) minus sign) - works like expected.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Some code in bgp_route_refresh_receive was spread across several
lines because of an end of line commit. Move comment to a place
to allow better formating.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
SA has found a case where we did a table lookup of a rn( and
associated lock of that node ) where we did not unlock it.
Unlock the node before moving on to the next one.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Currently During bgp open collision resolution if both
the router-id's are the same, we correctly follow
the RFC and close the connection. The problem is of course
that there is no notification of the error in configuration
to the end user other than a subtle open debug message.
Explicitly call out the miss-configuration as an error message
as that this miss-config took several hours of debugging to notice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
annie# show bgp ipv4 uni summ
BGP router identifier 192.168.201.136, local AS number 64539 vrf-id 0
BGP table version 22458946
RIB entries 1458006, using 178 MiB of memory
Peers 4, using 68 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
45.33.5.119 4 0 0 0 0 0 0 never Active
65.19.134.122 4 15096 4611832 108292 0 0 0 6d22h55m 800670
107.13.46.23 4 0 0 0 0 0 0 never Connect
robot(192.168.201.139) 4 64540 11159975 11365599 0 0 0 05w2d05h Connect
Total number of neighbors 4
On very busy systems The column output for MsgRcvd and MsgSent can quickly move past 7 columns.
Add a couple more to allow for even display.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If the peer was shutdown locally, it doesn't show up as admin. shutdown.
Instead it's treated as "Waiting for peer OPEN".
The same applies to when the peer reaches maximum-prefix count.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
During route-map processing we return an enum, the rpki
code was doing some extra gyrations that were unnecessary.
Simplify.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Prefix-SID path attribute Label-index TLV (type-1) is
used by SR-MPLS. And Label-index TLV MUST ignored
if that path attribute is append on non-Labeled-unicast
UPDATE message described on [ref1].
There is a problem case exist arround this implementation.
This commit fix that.
Before this commit,
unfortunally, setting Label-Index value is skipped at somecases.
because, Label-Index TLV implementation check the AFI/SAFI pair.
by mp_update variable that is set by bgp_mp_reach_parse function.
if MP_REACH_NLRI is present after PREFIX_SID, bgp_attr_psid_sub
function can't understand AFI/SAFI pair. and the order of each
path attributes is never no-deterministic thing for receiver.[ref2]
In this commit,
I re-located checking code of AFI/SAFI pair after path-attr loop.
[ref1](https://tools.ietf.org/html/draft-ietf-idr-bgp-prefix-sid-27#section-3.2)
> The Originator SRGB TLV may only appear in a BGP Prefix-SID attribute
> attached to IPv4/IPv6 Labeled Unicast prefixes ([RFC8277]). It MUST
> be ignored when received for other BGP AFI/SAFI combinations.
[ref2](https://tools.ietf.org/html/rfc4271#section-5)
> The sender of an UPDATE message SHOULD order path attributes within
> the UPDATE message in ascending order of attribute type. The
> receiver of an UPDATE message MUST be prepared to handle path
> attributes within UPDATE messages that are out of order.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Prefix-SID is desined to capable for TLV array.
That behaviour is important to support SR-MPLS feature
and that supported by previous PR #5418.
In that implementation, but if some additional data
(such as next BGP update message or next path attributes)
was present after Prefix-SID path attribute,
bgpd will parse that addional data as Prefix-SID TLV.
This commit fix that. before this commit, loop condition
is determed by stream is readable or not. In more correct
implementatoin, the prefix-sid boundaly should be checked
additonally. the length of Prefix-sid path attribute can
be get by bgp_attr_parse_args.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Override ORIGIN attribute if defined.
E.g.: Cisco and Juniper set ORIGIN for aggregated address
to IGP which is not what rfc4271 says.
This enables the same behavior, optionally.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Track the returned peer_sorted value and use it where
we can and recalculate where necessary.
This is an effort to reduce the amount of work done here.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The act of peer_sort() being called always set this value
even when we are just looking it up. We need to seperate
out the idea of lookup from set.
For those places that this is immediately obvious that
this is a lookup switch over to using this function.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
A route where ESI, GW IP, MAC and Label are all zero at the same time SHOULD
be treat-as-withdraw.
Invalid MAC addresses are broadcast or multicast MAC addresses. The route
MUST be treat-as-withdraw in case of an invalid MAC address.
As FRR support Ethernet NVO Tunnels only.
Route will be withdrawn when ESI, GW IP and MAC are zero or Invalid MAC
Test cases:
1) ET-5 route with valid RMAC extended community
2) ET-5 route no RMAC extended community
3) ET-5 route with Multicast MAC in RMAC extended community
4) ET-5 route with Broadcast MAC in RMAC extended community
Signed-off-by: Kishore Aramalla <karamalla@vmware.com>
Current failed reasons for bgp when you have a peer that
is not online yet is `Waiting for NHT`, even if NHT has
succeeded. Add some code to differentiate this.
eva# show bgp ipv4 uni summ failed
BGP router identifier 192.168.201.135, local AS number 3923 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
192.168.44.1 0 0 never Waiting for NHT
192.168.201.139 0 0 never Waiting for Open to Succeed
Total number of neighbors 2
eva#
eva# show bgp nexthop
Current BGP nexthop cache:
192.168.44.1 invalid, peer 192.168.44.1
Must be Connected
Last update: Mon Feb 10 19:05:19 2020
192.168.201.139 valid [IGP metric 0], #paths 0, peer 192.168.201.139
So 192.168.201.139 is a peer for a connected route that has not been
created on .139, while 44.1 nexthop tracking has not succeeded yet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
RCA:
When we install IPv6 prefix imported from EVPN RT-5 in vrf, nexthop of the IPv6
route should be IPv4 mapped IPv6 address. In function
install_evpn_route_entry_in_vrf, we generate a new attribute with IPv4 mapped
IPv6 nexthop, but we use parent->attr while creating the actual route.
Thus, Ipv4 nexthop is assigned to this route.
Because of this incorrect nexthop, we observed a crash in function
update_ipv6nh_for_route_install.
Fix:
Pass the new attribute with Ipv4 mapped Ipv6 nexthop to
bgp_create_evpn_bgp_path_info
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
For some reason we are getting a compile error around a variable I didn't
touch in the other commits. Make it happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is no need to have a temp variable to then store that
data in another temporary variable.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The new_afi and afi were being used over and over. Switch
to the end result we want and just use that from the get go.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The creation of a prefix pointer is unnecessary. Save the
prefix as part of the actual data structure. This will
reduce the data needed by 8 bytes per nexthop stored.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
According to https://tools.ietf.org/html/rfc7606 some of the attributes
MUST be handled as "treat-as-withdraw" approach.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
bgp flowspec packets are being forged correctly. There is no need to
check for bgp length, as the bgp nlri length is checked at reception.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There is no need for a call into get_afi_safi_str for the
json side since we add it based upon the afi safi str below.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The coverity SA believes that the regex value can possibly
be NULL. Not possible so let's make it happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If you have enums handled in a switch adding a default case
makes it fun to fix when new stuff is added later. Remove.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For Graceful restart clients have to send GR capabilities
library functions are added to encode capabilities and
also for zebra to decode client capabilities.
Co-authored-by: Santosh P K <sapk@vmware.com>
Co-authored-by: Soman K S <somanks@vmware.com>
Signed-off-by: Santosh P K <sapk@vmware.com>
Two of the evpn show commands with json option has memory leak.
1) show bgp l2vpn evpn route vni all json
2) show bgp l2vpn evpn route esi json
Before fix:
----------
Executed 'show bgp l2vpn evpn route vni all json' multiple times
used ordinary blocks continue to increase.
Note at the time of show command capture there were 22 evpn routes
in vni evpn route table.
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 9152 KiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 7300 KiB
Free small blocks: 1760 bytes
Free ordinary blocks: 1852 KiB
Ordinary blocks: 880
Small blocks: 51
Holding blocks: 0
Ticket:CM-27920
Reviewed By:
Testing Done:
After fix:
---------
Executed 'show bgp l2vpn evpn route vni all json' multiple times
Used ordinary blocks remains low.
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 8356 KiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 6492 KiB
Free small blocks: 1840 bytes
Free ordinary blocks: 1864 KiB
Ordinary blocks: 939
Small blocks: 52
Holding blocks: 0
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Found memory leak in json output of evpn's route
commands.
After executing 'show bgp l2vpn evpn route type prefix json'
and 'show bgp l2vpn evpn route type macip json' few times
(6 times) with more than 600 routes in total seeing
memory footprint for bgpd continue to grow.
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 12 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 8390 KiB
Free small blocks: 1760 bytes
Free ordinary blocks: 3762 KiB
Ordinary blocks: 1161
Small blocks: 51
Holding blocks: 0
Ticket:CM-27920
Testing Done:
After fix:
excute few times,
'show bgp l2vpn evpn route type prefix json'
and 'show bgp l2vpn evpn route type macip json'
commands where used ordinary blocks (uordblks) is
in steady state.
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 9968 KiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 6486 KiB
Free small blocks: 1984 bytes
Free ordinary blocks: 3482 KiB
Ordinary blocks: 1110
Small blocks: 54
Holding blocks: 0
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 10100 KiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 6488 KiB
Free small blocks: 1984 bytes
Free ordinary blocks: 3612 KiB
Ordinary blocks: 1113
Small blocks: 54
Holding blocks: 0
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* While the Deferral timer is running, signal route update pending
(ZEBRA_CLIENT_ROUTE_UPDATE_PENDING) from BGPD to Zebra.
* After expiry of the Deferral timer, the deferred routes are processed.
When the deferred route_list becomes empty, End-of-Rib is send to the
peer and route processing complete message (ZEBRA_CLIENT_ROUTE_UPDATE_COMPLETE)
is sent to Zebra. So that Zebra would delete any stale routes still
present in the rib.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Added CLI commands to update rib-stale-time, running in
Cmd : "bgp gaceful-restart rib-stale-time (1-3000)".
Cmd : "no bgp gaceful-restart rib-stale-time".
* Integrating the hooks function for signalling from BGPD
to ZEBRA to ZEBRA to enable or disable GR feature in ZEBRA
depending on bgp per peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
*Adding helper caller hooks function for signalling from BGPD
to ZEBRA to enable or disable GR feature in ZEBRA depending
on bgp per peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
*Adding helper function for signalling from BGPD to ZEBRA to
enable or disable GR feature in ZEBRA depending on bgp per
peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Data Structures, function declaration and Macros forSignalling
from BGPD to ZEBRA to enable or disable GR feature in ZEBRA
depending on bgp per peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
*After a restarting router comes up and the bgp session is
successfully established with the peer. If the restarting
router doesn’t have any route to send, it send EOR to
the peer immediately before receiving updates from its peers.
*Instead the restarting router should send EOR, if the
selection deferral timer is not running OR count of eor received
and eor required are matches then send EOR.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
BGP disable EOR sending is a useful command for testing various
scenarios of BGP graceful restart.
* Added the hidden CLI command : bgp graceful-restart disable-eor
* The CLI will not be displayed in "show running-config" and will not
be stored in configuration file.
* When enabled, EOR will not be sent to peer
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Signed-off-by: Soman K S <somanks@vmware.com>
When the peer router's gr mode had changed from helper/restart
to disable. The local bgp gr router should reset the peer
router's restart-time stored.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
bgp tcp connection.
When the BGP peer is configured between two bgp routes both routers would create
peer structure , when they receive each other’s open message. In this event both
speakers, open duplicate TCP sessions and send OPEN messages on each socket
simultaneously, the BGP Identifier is used to resolve which socket should be closed.
If BGP GR is enabled the old tcp session is dumped and the new session is retained.
So while this transfer of connection is happening, if all the bgp gr config
is not migrated to the new connection, the new bgp gr mode will never get applied.
Fix Summary:
1. Replicate GR configuration from the old session to the new session in bgp_accept().
2. Replicate GR configuration from stub to full-fledged peer in bgp_establish().
3. Disable all NSF flags, clear stale routes (if present), stop restart & stale timers
(if they are running) when the bgp GR mode is changed to “Disabled”.
4. Disable R-bit in cap, if it is not set the received open message.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
BGP GR Neighbor mode is showing the default string as “NotRecieved”,
as the bgp gr neighbour capability was not processed,
since the local mode is “Disable”.
However now it would be changed to “NotApplicable”.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
& GR is enabled.
When GR with deferral is enabled and connected routes are
distributed then in one race condition route node gets added
in to both deferred queue and work queue. If deferred queue
gets processed first then it ends up delete only flag while
leaving the entry in the work queue as it is. When a new update
comes for the same route node next time from peer then it hits
assert. Assert check is added to ensure we don’t add to work queue
again while it is already present.
So, check before adding in to deferred queue if it is already present
in work queue and bail if so.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Changing GR mode on a router needs a session reset from the
SAME router to negotiate new GR capability.
* The present GR implementation needs a session reset after every
new BGP GR mode change.
* When BGP session reset happens due to sending or receiving BGP
notification after changing BGP GR mode, there is no need of
explicit session reset.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* BGP GR Neighbour mode in show command would show as
“NotApplicable”, when local mode is “Disable”. As the bgp
gr neighbour capability was not processed, since the local mode
is “Disable”.
* Minor changes in show Selection Deferral Time.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Selection Deferral Timer for Graceful Restart.
* Added selection deferral timer handling function.
* Route marking as selection defer when update message is received.
* Staggered processing of routes which are pending best selection.
* Fix for multi-path test case.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
and DS.
* Added config commands and data structures for deferral timer
configuration and processing.
Cmd : bgp graceful-restart select-defer-time (0-3600)
Cmd : no bgp graceful-restart select-defertime (0-3600)
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Signed-off-by: Soman K S <somanks@vmware.com>
* Added new show command to show the graceful restart
information for each neighbor.
Cmd: show bgp [<ipv4|ipv6>] neighbors [<A.B.C.D|X:X::X:X|WORD>] graceful-restart
* Changes to show neighbors commands for displaying
graceful restart information.
Cmd :show [ip] bgp [<view|vrf> VIEWVRFNAME] [<ipv4|ipv6>] neighbors [<A.B.C.D|X:X::X:X|
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Changes to the capability sending function to advertise
graceful restart capability in the bgp OPEN message.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Added FSM for peer and global configuration for graceful restart
* Added debug option BGP_GRACEFUL_RESTART for logs specific to
graceful restart processing
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Duplicated domain name capability messages cause memory leak. The amount
of leaked memory is proportional to the size of the duplicated
capabilities. This bug was introduced in 2015.
To hit this, a BGP OPEN message must contain multiple FQDN capabilities.
Memory is leaked when the hostname portion of the capability is of
length 0, but the domainname portion is not, for any of the duplicated
capabilities beyond the first one.
https://tools.ietf.org/html/draft-walton-bgp-hostname-capability-00
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When withdrawing addpaths, adj_lookup was called to find the path that
needed to be withdrawn. It would lookup in the RB tree based on subgroup
pointer alone, often find the path with the wrong addpath ID, and return
null. Only the path highest in the tree sent to the subgroup could be
found, thus withdrawn.
Adding the addpath ID to the sort criteria for the RB tree allows us to
simplify the logic for adj_lookup, and address this problem. We are able
to remove the logic around non-addpath subgroups because the addpath ID
is consistently 0 for non-addpath adj_outs, so special logic to skip
matching the addpath ID isn't required. (As a side note, addpath will
also never use ID 0, so there won't be any ambiguity when looking at the
structure content.)
Signed-off-by: Mitchell Skiba <mskiba@amazon.com>
The vrrpd one conflicts with the standalone vrrpd package; also we're
installing daemons to /usr/lib/frr on some systems so they're not on
PATH.
Signed-off-by: David Lamparter <equinox@diac24.net>
bgpd already supports BGP Prefix-SID path attribute and
there are some sub-types of Prefix-SID path attribute.
This commits makes bgpd to support additional sub-types.
sub-Type-4 and sub-Type-5 for construct the VPNv4 SRv6 backend
with vpnv4-unicast address family.
This path attributes is already supported by Ciscos IOS-XR and NX-OS.
Prefix-SID sub-Type-4 and sub-Type-5 is defined on following
IETF-drafts.
Supports(A-part-of):
- https://tools.ietf.org/html/draft-dawra-idr-srv6-vpn-04
- https://tools.ietf.org/html/draft-dawra-idr-srv6-vpn-05
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
For evpn routes, nexthop and RMAC fileds are synced
in route add to zebra.
In case of EVPN routes display RMAC field in route add
debug log.
Reviewed By:CCR-9381
Testing Done:
BGP: nhop [1]: 27.0.0.11 if 30 VRF 26 RMAC 00:02:00:00:00:2e
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit makes bgpd to support VPNv4's extended
nexthop capability for bgp-capability negotiation
when BGP open messaging.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
uint8_t * cannot be cast to uint32_t * unless the
pointed-to address is aligned according to uint32_t's
alignment rules. And it usually is not.
Signed-off-by: Santosh P K <sapk@vmware.com>
With this change, we are able to set attributes via route-map to the default
route. It's useful in cases where we have two or more spines and we want to
prefer one router over others for leaves. This simplifies configuration instead
of using 'network 0.0.0.0/0' or 'ip route 0.0.0.0/0 ...' and 'redistribute
static' combination.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
uint8_t * cannot be cast to uint32_t * unless the pointed-to address is
aligned according to uint32_t's alignment rules. And it usually is not.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
advertise pip running configuration should
display ip followed by mac parameters value as defined
in cli signature.
advertise-pip is enabled by default, when displaying the
running configuration, there is '\n' added after
ip and mac parameters which was not guarded around
the non-default parameters.
Currently, for every bgp vrf instance it ends up
displaying l2vpn address-family section due to
unguarded newline.
running config:
router bgp 6004 vrf vrf1
!
address-family l2vpn evpn
exit-address-family
!
Ticket:CM-26964
Testing Done:
With fix when only 'router bgp 6004 vrf vrf1' configured,
running config looks like:
!
router bgp 6004 vrf vrf1
!
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit is about #5629 's issue.
Before this commit, bgpd creates format string of
bgp-route-distinguisher as int32, but correctly format
is uint32. current bgpd's sh-run-cli generate int32 rd,
so if user sets the rd as 1:4294967295(0x1:0xffffffff),
sh-run cli generates 1: -1 as running-config. This
commit fix that issue.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Guess what - for a bounds check to work, it has to happen *before* you
read the data. We were trusting the attribute field received in a prefix
SID attribute and then checking if it was correct afterwards, but if was
wrong we'd crash before that.
This fixes the problem, and adds additional paranoid bounds checks.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We should send a NOTIFICATION message with the Error Code Finite State
Machine Error if we receive NOTIFICATION in OpenSent state
as defined in https://tools.ietf.org/html/rfc4271#section-8.2.2
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
bgp nexthop cache update triggers RA for global ipv6
nexthop update.
In case of blackhole route type the outgoing interface
information is NULL which leads to bgpd crash.
Skip sending RA for blackhole nexthop type.
Ticket:CM-27299
Reviewed By:
Testing Done:
Configure bgp neighbor over global ipv6 address.
Configure static blackhole route with prefix includes
connected ipv6 global address.
Upon link flap, zebra sends nexthop update to bgp.
Bgp nexthop cache skips sending RA for blackhole nexthop type.
router bgp 65002
bgp router-id 91.189.93.190
...
neighbor 2001:67c:1360::b peer-group internal
static route:
ipv6 route 2001:67c:1360::/48 Null0 254
iface rowlink.4010
address 91.189.93.190/32
address 2001:67c:1360::a/128
Trigger ifdown rowlink.4010; ifup rowlink.4010
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
It's quite confusing when you see this:
```
exit1-debian-9(config-router)# bgp listen
listen Configure BGP defaults
```
And:
```
exit1-debian-9(config-router)# no bgp listen
listen unset maximum number of BGP Dynamic Neighbors that can be created
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
When passing a v4 multicast route to a peer send
the v4 nexthop as a preferred methodology.
Fixes: #5582
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fixes:
```
exit1-debian-9(config-router)# no bgp listen range 192.168.10.0/24 peer-group TEST
% Peer-group does not exist
exit1-debian-9(config-router)#
```
Closes https://github.com/FRRouting/frr/issues/5570
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
'NOTIFICATION' string in this message incorrectly implies a BGP
Notification message was the cause of this log. Removing it to
reduce confusion and replacing with function name.
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
Before it was:
```
exit1-debian-9# show ip bgp regexp ^200a
Invalid character in as-path access-list ^200a
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
rfapi_descriptor_rfp_utils.c is already built into libbgp.a and these
include paths have no effect at all.
Signed-off-by: David Lamparter <equinox@diac24.net>
This should keep backward compatibility when bgp show-hostname is
enabled/disabled.
Also show the real originator IP instead of showing fqdn of the route
reflector.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
To keep the calling code agnostic of the DNS resolver libary used, pass
a strerror-style string instead of a status code that would need extra
handling.
Signed-off-by: David Lamparter <equinox@diac24.net>
libc-ares doesn't do IP literals, so we have to do that before running
off to do DNS. Since this isn't BMP specific, move to lib/ so NHRP can
benefit too.
Signed-off-by: David Lamparter <equinox@diac24.net>
Problem: BGP peer pointer is present in keepalive hash table
even when socket has been closed in some race condition.
When keepalive tries to access this peer it asserts.
RCA: Below sequence of events causing assert.
1. Config node peer has went down due to TCP reset
it's FD has been set to -1.
2. Doppelganger peer goes to established state and it has
been added to peer hash table for keepalive when it was
in openconfirm state.
3. Config node parameters including FD are exchanged with
doppelganger. Doppelganger will not have FD -1.
4. Doppelganger will be deleted as part of this it will
remove it from the keepalive peer hash table.
5. While removing from hash table it tries to acquire lock.
6. During this time keepalive thread has the lock and in
a loop trying to send keepalive for peers in hash table.
7. It tries to send keepalive for doppelganger peer with fd
set to -1 and asserts.
Signed-off-by: Santosh P K <sapk@vmware.com>
* Move VNC interning to the appropriate spot
* Use existing bgp_attr_flush_encap to free encap sets
* Assert that refcounts are correct before exiting to keep the demons
contained in their fiery prison
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Use a per-nexthop flag to indicate the presence of labels; add
some utility zapi encode/decode apis for nexthops; use the zapi
apis more consistently.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This moves all the DFLT_BGP_* stuff over to the new defaults mechanism.
bgp_timers_nondefault() added to get better file-scoping.
v2: moved everything into bgp_vty.c so that the core BGP code is
independent of the CLI-specific defaults. This should make the future
northbound conversion easier.
Signed-off-by: David Lamparter <equinox@diac24.net>
There's no good reason to have this in bgpd.c; it's just there
historically. Move it to bgp_vty.c where it makes more sense.
Signed-off-by: David Lamparter <equinox@diac24.net>
Add a bit of code to allow hostname lookup failure to
not stall bmp communication.
Fixes: #5382
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The function bgp_table_range_lookup attempts to walk down
the table node data structures to find a list of matching
nodes. We need to guard against the current node from
not matching and not having anything in the child nodes.
Add a bit of code to guard against this.
Traceback that lead me down this path:
Nov 24 12:22:38 frr bgpd[20257]: Received signal 11 at 1574616158 (si_addr 0x2, PC 0x46cdc3); aborting...
Nov 24 12:22:38 frr bgpd[20257]: Backtrace for 11 stack frames:
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7fd1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(zlog_signal+0x113) [0x7fd1ad445db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(+0x70e65) [0x7fd1ad465e65]ad445db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libpthread.so.0(+0xf5f0) [0x7fd1abd605f0]45db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x63) [0x46cdc3]445957]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib64/frr/modules/bgpd_rpki.so(+0x4f0d) [0x7fd1a934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(thread_call+0x60) [0x7fd1ad4736e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(frr_run+0x128) [0x7fd1ad443ab8]e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd(main+0x2e3) [0x41c043]1ad443ab8]e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd1ab9a5505]f0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd() [0x41d9bb]main+0xf5) [0x7fd1ab9a5505]f0d]57]
Nov 24 12:22:38 frr bgpd[20257]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:351#012; aborting...
Nov 24 12:22:38 frr watchfrr[6779]: [EC 268435457] bgpd state -> down : read returned EOF
Nov 24 12:22:38 frr zebra[5952]: [EC 4043309116] Client 'bgp' encountered an error and is shutting down.
Nov 24 12:22:38 frr zebra[5952]: zebra/zebra_ptm.c:1345 failed to find process pid registration
Nov 24 12:22:38 frr zebra[5952]: client 15 disconnected. 0 bgp routes removed from the rib
I am not really 100% sure what we are really trying to do with this function, but we must
guard against child nodes not having any data.
Fixes: #5440
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When dumping a large bit of table data via bgp_show_table
and if there is no information to display for a particular
`struct bgp_node *` the data allocated via json_object_new_array()
is leaked. Not a big deal on small tables but if you have a full
bgp feed and issue a show command that does not match any of
the route nodes ( say `vtysh -c "show bgp ipv4 large-community-list FOO"`)
then we will leak memory.
Before code change and issuing the above show bgp large-community-list command 15-20 times:
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: > 2GB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: > 2GB
Free small blocks: 31 MiB
Free ordinary blocks: 616 KiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
After:
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 924 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 558 MiB
Free small blocks: 26 MiB
Free ordinary blocks: 340 MiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
Please note the 340mb of free ordinary blocks is from the fact I issued a
`show bgp ipv4 uni json` command and generated a large amount of data.
Fixes: #5445
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Early exits without appropriate cleanup were causing obscure double
frees and other issues later on in the attribute parsing code. If we
return anything except a hard attribute parse error, we have cleanup and
refcounts to manage.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Without with fix we can't delete large-community-list using
no bgp large-community-list standard WORD, but no bgp large-community-list WORD
Let's keep this identical what we have with expanded lists as well.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
This patch allows using sequence numbers for community lists. We already have
this for prefix-lists and access-lists.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
In rare situations, the local route in a VNI may not get selected as the
best route. One situation is during a race between bgp and zebra which
was addressed in a prior commit. This change addresses another situation
where due to a change of tunnel IP, it is possible that a received route
may be selected as the best route if the path selection needs to take
next hop IPs into consideration. This is a pretty convoluted scenario,
but the code should handle it and delete and withdraw the local route
as well as (re)install the received route.
Ticket: CM-24114
Reviewed By: CCR-9487
Testing Done:
1. Manual tests - note, problem is not readily reproducible
2. evpn-smoke - results documented in the ticket
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
If a peer advertised capability addpath in their OPEN, but sent us an
UPDATE without an ADDPATH, we overflow a heap buffer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This changeset follows the PR
https://github.com/FRRouting/frr/pull/5334
Above PR adds nexthop tracking support for EVPN RT-5 nexthops.
This route is marked VALID only if the BGP route has a valid nexthop.
If the EVPN peer is an EBGP pee and "disable_connected_check" flag is not set,
"connected" check is performed for the EVPN nexthop.
But, usually EVPN nexthop is not the BGP peering address, but the VTEP address.
Also, NEXTHOP_UNCHANGED flag is enabled by default for EVPN.
As a result, in a common deployment for EVPN, EVPN nexthop is not connected.
Thus, adding a fix to remove the "connected" check for EVPN nexthops.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
Instead of CMD_WARNING, use CMD_WARNING_CONFIG_FAILED
for any mis-configuration scenario.
Testing Done:
TOR(config)# router bgp 5548
TOR(config-router)# address-family l2vpn evpn
TOR(config-router-af)# no advertise-pip
This command is supported under L3VNI BGP EVPN VRF
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
when a pip is disabled or mac-vlan is not present
use anycast MAC as RMAC value.
Ticket:CM-26923
Reviewed By:CCR-9417
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
For self type-2 routes, do not assign system-rmac
as attribute RMAC value if advertise-pip is disable
or macvlan is not present.
Ticket:CM-26923
Reviewed By:CCR-9397
Testing Done:
pip is disabled under bgp vrf2 instance.
Trigger frr-restart.
Before fix:
*> [2]:[0]:[48]:[00:02:00:00:00:2e]:[32]:[45.0.4.4]
36.0.0.11 32768 i
ET:8 RT:5546:1004 RT:5546:4002 Rmac:00:02:00:00:00:2e
After fix:
*> [2]:[0]:[48]:[00:02:00:00:00:2e]:[32]:[45.0.4.4]
36.0.0.11 32768 i
ET:8 RT:5546:1004 RT:5546:4002 Rmac:44:38:39:ff:ff:01
TOR# ifquery vlan1004
auto vlan1004
iface vlan1004
address 45.0.4.4/24
vlan-id 1004
vrf vrf2
VNI: 4002 (known to the kernel)
Type: L3
Tenant VRF: vrf2
RD: 45.0.6.4:3
Originator IP: 36.0.0.11
Advertise-pip: Yes
System-IP: 27.0.0.11
System-MAC: 00:02:00:00:00:2e
Router-MAC: 44:38:39:ff:ff:01
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
By default announct Self Type-2 routes with
system IP as nexthop and system MAC as
nexthop.
An API to check type-2 is self route via
checking ipv4/ipv6 address from connected interfaces list.
An API to extract RMAC and nexthop for type-2
routes based on advertise-svi-ip knob is enabled.
When advertise-pip is enabled/disabled, trigger type-2
route update. For self type-2 routes to use
anycast or individual (rmac, nexthop) addresses.
Ticket:CM-26190
Reviewed By:
Testing Done:
Enable 'advertise-svi-ip' knob in bgp default instance.
the vrf instance svi ip is advertised with nexthop
as default instance router-id and RMAC as system MAC.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In L3VNI add callback parse, vrr rmac value.
For non-zero vrr mac value, use it as anycast RMAC
and svi mac as individual rmac value.
If advertise-pip is disable or vrr rmac is not present
use svi mac as anycast rmac value for all routes.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Evpn Primary IP advertisement feature uses
individual system IP and system MAC for prefix (type-5)
and self type-2 routes.
The PIP knob is enabled by default for bgp vrf instance.
Configuration CLI for enable/disable PIP feature knob.
User can configure PIP system IP and MAC to retain as
permanent values.
For the PIP IP, the default behavior is to accept bgp default
instance's router-id. When the default instance router-id change,
reflect PIP IP assignment.
Reflect type-5 to use system-IP and system MAC as nexthop and RMAC
values.
Ticket:CM-26190
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Spaces were not being accounted for in the heap buffer sizing, leading
to a heap buffer overflow when encoding large communities to their
string representations.
This patch also uses safer functions to do the encoding instead of
pointer math.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Tons of insane just-so pointer math here where it is not needed. This is
too smart. Use safer methods.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The half and reuse variables can never be 1 but the
SA systems we have do not know this and think it is possible.
Provide the kick in the snarples that the SA needs to know
this is not true.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Copy paste leads to invalid read of 1 byte off the heap when converting
extended community attributes into strings.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Bug: While preparing the JSON output, 2 loops are traversed: the outer loop
loops through RD, and the inner loop loops through the prefixes of that RD.
We hit the bug (printing blank RD and stale or null prefix info) when the inner
loop exits with nothing to print, (without allocating json_routes) and the outer
loop still tries to attach it to the parent, json_adv. Thus, we have
key=<BLANK RD>, value=<junk or prev json_routes>
The fix: Avoid attaching json_routes to the parent json if there
is nothing to print.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
According to RFC 8277 IPv4 LU NLRI can be withdrawn using label 0x000000.
This RFC updates RFC3101 where it should be done only with 0x800000 label value.
Juniper implementation sets value 0x000000 when prefix is being withdrawn.
Page 12 RFC8277 states:
[RFC3107] also made it possible to withdraw a binding without
specifying the label explicitly, by setting the Compatibility field
to 0x800000. However, some implementations set it to 0x000000. In
order to ensure backwards compatibility, it is RECOMMENDED by this
document that the Compatibility field be set to 0x800000, but it is
REQUIRED that it be ignored upon reception.
Now FRR drops BGP session when receives such BGP update.
Signed-off-by: Aleksandr Klimenko <v00lk@bk.ru>
* IPv6 routes received via a ibgp session with one of its own interface as
nexthop are getting installed in the BGP table.
*A common table to be implemented should take cares of both
ipv4 and ipv6 connected addresses.
Signed-off-by: Biswajit Sadhu sadhub@vmware.com
The command `show ip bgp ipv4|ipv6 vpn neighbors <ip> prefix-counts`
caused a segfault, because the 2-level routing was not accounted for.
Signed-off-by: Juergen Werner <juergen@opensourcerouting.org>
The CLI was not parsing prefix format of ipv6 address.
This fixes the bug: https://github.com/FRRouting/frr/issues/5322
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
The total_peercount table was created as a short cut for queries about
if addpath was enabled at all on a particular afi/safi. However, the
values weren't updated, so BGP would act as if addpath wasn't enabled
when determining if updates should be sent out. The error in behavior
was much more noticeable in tx-all than best-per-as, since changes in
what is sent by best-per-as would often trigger updates even if addpath
wasn't enabled.
Signed-off-by: Mitchell Skiba <mskiba@amazon.com>