This introduces the option for a user to lookup one specific prefix in
the advertised-routes or received-routes table of a peer.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
```
anlan(config-router-af)# vni 33
anlan(config-router-af-vni)# route-target both 44:55
anlan(config-router-af-vni)# no route-target both 44:55
vtysh: error reading from bgpd: Resource temporarily unavailable (11)Warning: closing connection to bgpd because of an I/O error!
```
When `bgp_evpn_vni_rt_cmd` deals with "both" type, it wrongly created
only one node ( should be two nodes ) for lists of both `vpn->import_rtl` and
`vpn->export_rtl`. At this time, the two lists are already wrong.
In `no route-target both RT`, it will free the single node from lists of both
`vpn->import_rtl` and `vpn->export_rtl`. After freed from `vpn->import_rtl`,
it is "use-after-free" at the time of freeing it from `vpn->export_rtl`.
It causes crash sometimes, or other unexpected behaviours.
This issue is introduced by commit `3b7e8d`, which have adjusted both
`bgp_evpn_vni_rt_cmd` and `bgp_evpn_vrf_rt_cmd`.
Since `bgp_evpn_vrf_rt_cmd/no_bgp_evpn_vrf_rt_cmd` works well again
unintentionally with commit `7022da`, only `bgp_evpn_vni_rt_cmd` needs to
modify - add two nodes for "both" type and some explicit comments for this
special case of "both" type.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
BGP was modified in a0b937de42
to grab the peer->io_mtx before validating the header to ensure
that the input Queue was not being modified by anyone else at that
moment in time. Unfortunately validate_header can detect a problem
and attempt to relock the mutex, which deadlocks. This deadlock in
the bgp_io pthread is the lone deadlock at first, eventually though
bgp attempts to write another packet to the peer( say when the
it's time to send the next packet ) and the main pthread of bgpd
becomes deadlocked and then the whole bgpd process is stuck at that
point in time leaving us dead in the water.
The point of locking the mutex earlier was to ensure that the input
Queue wasn't being modified by anyone else, (Say reading off it )
as that we wanted to ensure that we don't hold more packets then necessary.
Let's grab the mutex long enough to look at the input Q size, this
ensure that we have room and then we can validate_header and do the right
thing from there. We'll need to lock the mutex when we actually move it
into the input Q as well.
Fixes: #12725
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Commit: 3cdb03fba7
changed the vty_json output to not be pretty printing.
The previous commit in the tree added vty_json_no_pretty
let's use that instead
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Initial commit: 23b2a7ef52
changed the json output of `show bgp <afi> <safi> json` to
not have pretty print because when under a situation where
there are a bunch of routes with a large scale ecmp show
output was taking forever and this commit cut 2 minutes out
of vtysh run time.
Subusequent commit: f4ec52f7cc
changed this back.
When upgrading to latest version the long run time was noticed
due to testing. Let's add back this functionality such that
FRR can have reduced run times with vtysh when it's really
needed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Before this patch, we always passed `struct attr` for NLRI_UPDATE, but if we
have a situation with treat-as-withdraw (for example: malformed attribute, or
using a command like `neighbor path-attribute treat-as-withdraw`) the route
MUST be withdrawn form the BGP table.
Hence, we MUST pass attr as NULL, in this case we already have this check
under NLRI_ATTR_ARG() macro, just reuse it properly.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
The function ecommunity_str2com_internal appears to want to handle
the ecommunity_token_rt6 enum but skips over it. Commit
9a659715df tried to add this but I really
don't see how this is going to behave correctly. Add the
ecommunity_token_rt6 case to the switch statement so it is handled
appropriately?
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The function ecommunity_str2com_internal appears to want to handle
the ecommunity_token_rt6 enum but skips over it. Commit
9a659715df tried to add this but I really
don't see how this is going to behave correctly. Add the
ecommunity_token_rt6 case to the switch statement so it is handled
appropriately?
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Before this, if the peer disables sending FQDN capability, the old hostname
still (STALE) exists and is misleading in the outputs of `show bgp ...`.
Especially when using with `bgp default show-hostname`, etc.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
These two functions always return 0. As such any and all
tests against this make no sense. Remove the return 0
to a void and follow the chain, logically, to remove all
the dead code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Moves the old/new IP comparison into handle_tunnel_ip_change instead of
expecting the caller to do the check on their own.
Also changes handle_tunnel_ip_change to return void since it only ever
returned 0 in all cases.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
When processing a new local VNI, we were always walking the global EVPN
table to look for routes that needed to be removed due to a martian
nexthop change (specifically a tunnel-ip change).
Since the martian TIP table is global (all VNIs) + the walk is also in
the global table (all VNIs), we can trust that any new TIP from any VNI
would result in routes getting removed from the global table and
unimported from all live (L2)VNIs.
i.e.
The only time this update is actionable is if we are adding/removing an
IP from the martian TIP table, and we do not need to walk the table for
normal refcount adjustments.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
We do use non-constant/literal format strings in a few places for more
or less valid reasons; put `ignored "-Wformat-nonliteral"` around those
so we can have the warning enabled for everywhere else.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This fix updates the nexthop length of a bgp update to be
transmitted to a remote peer. Before the previous commit,
the ipv6 nexthop length was internally set to 32 bytes which
was not correct, as it should be 48 bytes which is conform
to the vpnv6 encoding format.
However, without the previous match, even if internally, the
nexthop length was set to 32, the real nexthop length was set
to 48 bytes, and everything was operating ok.
Now, if we use the following route-map, and attach it to
outgoing for vpnv6 address family, then we have a malformed
packet detected, and the peering breaks.
> route-map rmap permit 1
> set ipv6 next-hop global 5:5::3:6
> set ipv6 next-hop local fe80:55::333:222
Maintain the mp_nexthop_len to 48 bytes if it was already set
to 48 previously.
Fixes: 35ac9b53f2 ("bgpd: fix vpnv6 nexthop encoding")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
For BGP evpn route table detail json to use
non pretty form of display.
Problem:
In scaled evpn route table detail json dump
occupies high resources (CPU + memory) of the system.
In high scale evpn route dump using pretty form
hogs CPU for a while which can trigger watchfrr
to kill bgpd.
Solution:
Avoid pretty JSON print for detail version dump
Signed-off-by: Chirag Shah <chirag@nvidia.com>
This change updates the nexthop attribute length
accordingly to the safi used. Actually, with the
previous commit, the length calculated was not
aligned with the real nexthop length. Such packet
received by remote peer was malformed, and this
was resulting in breaking vpnv6 peering.
Fix this by updating appropriately the real
nexthop length.
Fixes: 35ac9b53f2 ("bgpd: fix vpnv6 nexthop encoding")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Use the correct function parameters type to avoid truncation and other
signal issues.
Found by Coverity Scan (CID 1519802)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
In ipv6 vpn, when the global and the local ipv6 address are received,
when re-transmitting the bgp ipv6 update, the nexthop attribute
length must still be 48 bytes.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
RFC7611 introduces new extended community ACCEPT_OWN and is already
implemented for FRR in the previous PR. However, this PR broke
compatibility about importing VPN routes.
Let's consider the following situation. There are 2 routers and these
routers connects with iBGP session. These routers have two VRF, vrf10
and vrf20, and RD 0:10, 0:20 is configured as the route distinguisher
of vrf10 and vrf20 respectively.
+- R1 --------+ +- R2 --------+
| +---------+ | | +---------+ |
| | VRF10 | | | | VRF10 | |
| | RD 0:10 +--------+ RD 0:10 | |
| +---------+ | | +---------+ |
| +---------+ | | +---------+ |
| | VRF20 +--------+ VRF20 | |
| | RD 0:20 | | | | RD 0:20 | |
| +---------+ | | +---------+ |
+-------------+ +-------------+
In this situation, the VPN routes from R1's VRF10 should be imported to
R2's VRF10 and the VPN routes from R2's VRF10 should be imported to R2's
VRF20. However, the current implementation of ACCEPT_OWN will always
reject routes if the RD of VPN routes are matched with the RD of VRF.
Similar issues will happen in local VRF2VRF route leaks. In such cases,
the route reaked from VRF10 should be imported to VRF20. However, the
current implementation of ACCEPT_OWN will not permit them.
+- R1 ---------------------+
| +------------+ |
| +----v----+ +----v----+ |
| | VRF10 | | VRF20 | |
| | RD 0:10 | | RD 0:10 | |
| +---------+ +---------+ |
+--------------------------+
So, this commit add additional condition in RD match. If the route
doesn't have ACCEPT_OWN extended community, source VRF check will be
skipped.
[RFC7611]: https://datatracker.ietf.org/doc/html/rfc7611
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
The input queue limit does not belong under router bgp. This
is a dev escape and should just be removed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Consider this scenario:
Lots of peers with a bunch of route information that is changing
fast. One of the peers happens to be really slow for whatever
reason. The way the output queue is filled is that bgpd puts
64 packets at a time and then reschedules itself to send more
in the future. Now suppose that peer has hit it's input Queue
limit and is slow. As such bgp will continue to add data to
the output Queue, irrelevant if the other side is receiving
this data.
Let's limit the Output Queue to the same limit as the Input
Queue. This should prevent bgp eating up large amounts of
memory as stream data when under severe network trauma.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>