Delay BGP configuration until we receive end-configuration hook to make sure
we don't send partial updates to peer which leads to broken Graceful-Restart.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Since the `RB_INSERT()` is called after not found in RB tree, it MUST be ok and
and return zero. The check of returning value of `RB_INSERT()` is redundant,
just remove them.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
The EAD-per-ES route can be fragmented to fit the EVIs on the switch. This
command allows the EVI limit to be configured -
!
router bgp 5556
!
address-family l2vpn evpn
ead-es-frag evi-limit 200
exit-address-family
!
!
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
The EAD-per-ES route carries ECs for all the ES-EVI RTs. As the number of VNIs
increase all RTs do not fit into a standard BGP UPDATE (4K) so the route needs
to be fragmented.
Each fragment is associated with a separate RD and frag-id -
1. Local ES-per-EAD -
ES route table - {ES-frag-ID, ESI, ET=0xffffffff, VTEP-IP}
global route table - {RD-=ES-frag-RD, ESI, ET=0xffffffff}
2. Remote ES-per-EAD -
VNI route table - {ESI, ET=0xffffffff, VTEP-IP}
global route table - {RD-=ES-frag-RD, ESI, ET=0xffffffff}
Note: The fragment ID is abandoned in the per-VNI routing table. At this
point that is acceptable as we dont expect more than one-ES-per-EAD fragment
to be imported into the per-VNI routing table. But that may need to be
re-worked at a later point.
CLI changes (sample with 4 VNIs per-fragment for experimental pruposes) -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@torm-11:mgmt:~# vtysh -c "show bgp l2vpn evpn es 03:44:38:39:ff:ff:01:00:00:01"
ESI: 03:44:38:39:ff:ff:01:00:00:01
Type: LR
RD: 27.0.0.21:3
Originator-IP: 27.0.0.21
Local ES DF preference: 50000
VNI Count: 10
Remote VNI Count: 10
VRF Count: 3
MACIP EVI Path Count: 33
MACIP Global Path Count: 198
Inconsistent VNI VTEP Count: 0
Inconsistencies: -
Fragments: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
27.0.0.21:3 EVIs: 4
27.0.0.21:13 EVIs: 4
27.0.0.21:22 EVIs: 2
VTEPs:
27.0.0.22 flags: EA df_alg: preference df_pref: 32767
27.0.0.23 flags: EA df_alg: preference df_pref: 32767
root@torm-11:mgmt:~# vtysh -c "show bgp l2vpn evpn es-evi vni 1002 detail"
VNI: 1002 ESI: 03:44:38:39:ff:ff:01:00:00:01
Type: LR
ES fragment RD: 27.0.0.21:13 >>>>>>>>>>>>>>>>>>>>>>>>>
Inconsistencies: -
VTEPs: 27.0.0.22(EV),27.0.0.23(EV)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
PS: The number of EVIs per-fragment has been set to 128 and may need further
tuning.
Ticket: #2632967
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
This is an alternate to EAD route fragmenation and allows the user to limit
the route to a single UPDATE (<4K) independent of the number of EVIs.
Sample config (add one l2-vni RT from each VRF) -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
!
router bgp 5556
!
address-family l2vpn evpn
ead-es-route-target export 5556:1001
ead-es-route-target export 5556:1004
ead-es-route-target export 5556:1008
exit-address-family
!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Sample route
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Network Next Hop Metric LocPrf Weight Path
*> [1]:[4294967295]:[03:44:38:39:ff:ff:01:00:00:01]:[32]:[27.0.0.21]
27.0.0.21 32768 i
ET:8 ESI-label-Rt:AA RT:5556:1001 RT:5556:1004 RT:5556:1008
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
When configured, the ead-es-route-target is used instead of
the auto-generated version that includes all associated EVI's RTs.
Ticket: #2632967
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Importing local es routes should be skipped. But the check of it is a bit wrong.
It is ok that local es routes can't be imported, but importing local es will
wrongly enter `uninstall` procedure.
Just adjust this check to make it clear. Immediately return in the case
of importing local es routes.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
`bgp_evpn_import_route_in_vrfs()` is special ( l2vpn ) form of
`install_uninstall_evpn_route() with `AFI_L2VPN` and `SAFI_EVPN` family.
No caller, just remove it.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Currently the nexthop tracking code is only sending to the requestor
what it was requested to match against. When the nexthop tracking
code was simplified to not need an import check and a nexthop check
in b8210849b8 for bgpd. It was not
noticed that a longer prefix could match but it would be seen
as a match because FRR was not sending up both the resolved
route prefix and the route FRR was asked to match against.
This code change causes the nexthop tracking code to pass
back up the matched requested route (so that the calling
protocol can figure out which one it is being told about )
as well as the actual prefix that was matched to.
Fixes: #10766
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
sockopt_cork is a no-op function that was cleaned up
in 2017. Since then it's still not being used. At
this point in time there is little point in keeping a
dead function that will not be used because of vagaries
between platforms
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If you enter:
router bgp 325
bgp graceful-restart
bgp graceful-restart
!
The second command entered will fail. This is not
something that should be failing as that it's a no-op.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Commit: ea47320b1d
Modified the bgp_clear_stale_route function to have
better indentation, but in the process changed some
`continue;` statements to `break;` which modified
the looping and caused stale paths to not always be
removed upon an update.
To reproduce: A ---- B, setup with addpath and GR
One side has a prefix with nhop1 and nhop2, kill one
side and then resend the same prefix with nhop3,
paths nhop1 and 2 become stale and never removed.
Code inspection clearly shows that that `continue`
statements became `break` statements causing the
loop over all paths to stop prematurely.
The fix is to change the break back to continue
statements so the loop can continue instead of
stopping.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When BGP detects that a peering is using a global address but no v6 LL
address has been created for the interface that the global address is
on warn the user that something is amiss and they need to fix it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Bad formatting applied and it worked with small amount of prefixes (lurking).
With full BGP feed and full RPKI table, this causes infinity loop.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Before:
(config-router-af)# advertise-all-vni
% Please unconfigure EVPN in VRF (null)
After:
(config-router-af)# advertise-all-vni
% Please unconfigure EVPN in VRF default
Just use `bgp->name_pretty` to make it pretty.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
This patch adds transpostion_offset and transposition_len to bgp_sid_info,
and transposes SID only at bgp_zebra_announce.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
This patch changes the format of the Prefix-SID advertised by
bgpd. In current implementation, transposed SIDs were
advertised, which caused two problems:
1. bgpd that receives SRv6 L3VPN routes whose SID is
transposed couldn't put bgp_attr_srv6_l3vpn whose those
routes together. This leads extra memory consumption.
2. Some implementation will reject a route with transposed SID.
This will affect interoperability.
For those reasons, in this patch, instead of advertising
transposed SID, we change it to advertise the locator of SID.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
For the later patches, this patch changes the behavior of alloc_new sid
so that bgpd record not only SID for VRF, but also Locator of SID.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
Conversion of bgp error codes returned for cli input into
an enum and then properly handling all the error cases
in bgp_vty_return.
Because not all error codes returned were properly handled
in this function there existed configuration examples that
were accepted on the cli without an error message but not
saved.
Fixes: #10589
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
BGP_ERR_PEER_GROUP_MEMBER and BGP_ERR_PEER_GROUP_PEER_TYPE_DIFFERENT
both are not handled by bgp_vty_return, but both can be handled by
this function as that there is nothing special going on here.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
confederations are checking to see that the bgp pointer
is non-null. But it's impossible to have a null pointer
in the cli and in all paths we have already deref'ed the bgp
pointer. Let's remove that error code as that it is impossible
to happen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a 15 minute warning to the logging system when
bgp policy is not setup properly. Operators keep asking
about the missing policy( on upgrade typically ). Let's
try to give them a bit more of a hint when something is
going wrong as that they are clearly missing the other
various places FRR tells them about it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
bgp_attr_undup does the same thing as bgp_attr_flush – frees the
temporary data that might be allocated when applying a route-map. There
is no need to have two separate functions for that.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
bgp_attr_default_set creates a new empty aspath. If family error happens,
this aspath is not freed. Move attr initialization after we checked the
family.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Using memcmp is wrong because struct nexthop may contain unitialized
padding bytes that should not be compared.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Using memcmp is wrong because struct ipaddr may contain unitialized
padding bytes that should not be compared.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Avoid use-after-free situation. Flush attr_extra structure only when flushing
all attributes, not just for unintern.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
`struct prefix p` was declared inside an if statement
where we assign the address of to a pointer that is
then passed to a sub function. This will eventually
leave us in a bad state.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Passing argument "&rec" of type "struct pfx_record *" and argument
"1UL" to function "read" is suspicious because
"sizeof (struct pfx_record) /*40*/" is expected.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Description:
Replacing memcmp at certain places,
to avoid the coverity issues caused by it.
Co-authored-by: Kantesh Mundargi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
Before this patch, if the first server crashed or was terminated, RPKI
connection keeps _active_ forever.
With this patch, if we catch connection problem (FATAL), we reset RPKI, to
switch to another available RTR-Server by using configured preference.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
This is the initial work to move all non IPv4/IPv6 AFI related
attributes/structs to attr->extra to avoid unnecesarry allocations.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
When setting maximum-prefix-out on peer-group, the applied value on
member is 0.
Fix usage of maximum-prefix-out on peer-group.
The peer_maximum_prefix_out_(un)set functions are derived from
peer_maximum_prefix_(un)set.
Fixes: fde246e835 ("bgpd: Add an option to limit outgoing prefixes")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Specifying a number is not possible with command no neighbor X.X.X.X
maximum-prefix-out
> frr(config-router-af)# no neighbor 192.168.1.2 maximum-prefix-out 1
> % Unknown command: no neighbor 192.168.1.2 maximum-prefix-out 1
This patch allows it.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
In situations where remove-private-AS is configured for eBGP peers
residing in a private ASN, the peer's ASN was not being retained
in the AS-Path which can allow loops to occur. This was addressed
in a prior commit but it only addressed cases where the "replace-AS"
keyword was configured.
This commit ensures we retain the peer's ASN when using
"remove-private-AS" for eBGP peers in a private ASN regardless of other
keywords.
Setup:
=========
router bgp 4200000002
neighbor enp1s0 interface v6only remote-as external
neighbor enp6s0 interface v6only remote-as external
!
address-family ipv4 unicast
neighbor enp6s0 remove-private-AS
exit-address-family
ub18# show ip bgp sum | include 420000
BGP router identifier 100.64.0.111, local AS number 4200000002 vrf-id 0 <<<<< local asn 4200000002
ub20(enp1s0) 4 4200000001 22 22 0 0 0 00:00:57 1 1
ub20(enp6s0) 4 4200000001 21 22 0 0 0 00:00:57 0 1 <<<< peer asn 4200000001
ub18# show ip bgp | include 0.2
Default local pref 100, local AS 4200000002
*> 100.64.0.2/32 enp1s0 0 0 4200000001 4200000004 4200000005 4200000001 i
Before ("remote-private-AS" only):
=========
ub18# show ip bgp neighbors enp6s0 advertised-routes | include 100.64.0.2
*> 100.64.0.2/32 :: 0 i <<<<< empty as-path, no way to prevent loop
After ("remote-private-AS" only):
=========
ub18# show ip bgp neighbors enp6s0 advertised-routes | include 100.64.0.2
*> 100.64.0.2/32 :: 0 4200000001 4200000001 i <<<< retain peer's asn, breaks loop
Ticket: 2857047
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Opaque data takes up a lot of memory when there are a lot of routes on
the box. Given that this is just a cosmetic info, I propose to disable
it by default to not shock people who start using FRR for the first time
or upgrades from an old version.
Fixes#10101.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
```
exit1-debian-11(config-router)# neighbor 192.168.100.3 remote-as external
exit1-debian-11(config-router)# do sh run | include extended
exit1-debian-11(config-router)# neighbor 192.168.100.3 capability extended-nexthop
exit1-debian-11(config-router)# do sh run | include extended
neighbor 192.168.100.3 capability extended-nexthop
exit1-debian-11(config-router)# no neighbor 192.168.100.3 capability extended-nexthop
exit1-debian-11(config-router)# do sh run | include extended
exit1-debian-11(config-router)# neighbor eth0 interface remote-as external
exit1-debian-11(config-router)# do sh run | include extended
exit1-debian-11(config-router)# neighbor eth0 capability extended-nexthop
exit1-debian-11(config-router)# do sh run | include extended
exit1-debian-11(config-router)# no neighbor eth0 capability extended-nexthop
exit1-debian-11(config-router)# do sh run | include extended
no neighbor eth0 capability extended-nexthop
exit1-debian-11(config-router)#
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Abstract:
- The command "neighbor PEER maximum-prefix-out NUMBER" cannot be applied
without clearing the BGP neighbor.
- Apply the maximum-prefix-out value as soon as it is modified without
clearing the neighbor.
subgroup_update_packet() and subgroup_withdraw_packet() respectively
manages the announcement and withdrawal BGP message to the peer.
subgrp->scount counter counts the number of sent prefixes.
Before the patch, the maximum out prefix limitation was applied in
subgroup_update_packet() in order that subgrp->scount never exceeds the
limit. Setting a limit inferior to the effective number of sent prefix
did not result in sending any withdrawal message to reduce the number of
sent prefixes. Without clearing the BGP neighbor, the limitation only
applied to the announcement of new prefixes when the limitation was
over.
With the patch, the limitation is checked in subgroup_announce_check().
The function is intended to say whether a prefix has to be announced in
regards to the prefix-list, route-map... Now when a maximum-prefix-out
value is changed/removed, the neighbor AFI/SAFI table is re-parsed in
the same way as for the application of route-map, prefix-lists...
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Take into account the maximum-prefix-out value when calculating the
update-group hash.
Fixes: fde246e835 ("bgpd: Add an option to limit outgoing prefixes")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
BGP EVPN custom `union gw_addr` is basically the same thing as a common
`struct ipaddr` but it lacks the address family which is needed in some
cases.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
This code is populating a temporary variable `add` instead of the attr.
Initially this variable was later copied to the attr but the copying was
erroneously deleted by 0a50c2481. Directly populate the attr to restore
the correct behavior.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Here we try to compare the new attr with the existing one but this call
compares the existing index with zero instead. attrhash_cmp already
compares indexes using overlay_index_same so this call is both wrong and
useless.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
```
exit1-debian-11# sh ip bgp 10.10.10.10/32
BGP routing table entry for 10.10.10.10/32, version 14
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000, (stale)
192.168.0.2 from 192.168.0.2 (0.0.0.0)
Origin incomplete, metric 0, valid, external, best (First path received)
Last update: Wed Jan 19 17:13:51 2022
Time until Graceful Restart stale route deleted: 117
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
If the existing listener is the same as the peer, treat as self and reject.
```
exit1-debian-11# sh bgp listeners
Name fd Address
---------------------------
default 24 192.168.10.123
exit1-debian-11# con
exit1-debian-11(config)# router bgp
exit1-debian-11(config-router)# neighbor 192.168.10.123 remote-as external
% Can not configure the local system as neighbor
exit1-debian-11# sh bgp listeners
Name fd Address
---------------------------
default 24 0.0.0.0
default 25 ::
exit1-debian-11# con
exit1-debian-11(config)# router bgp
exit1-debian-11(config-router)# neighbor 192.168.10.123 remote-as external
% Can not configure the local system as neighbor
exit1-debian-11(config-router)#
exit1-debian-11# sh bgp listeners
Name fd Address
---------------------------
default 24 192.168.0.1
exit1-debian-11# con
exit1-debian-11(config)# router bgp
exit1-debian-11(config-router)# neighbor 192.168.10.123 remote-as external
exit1-debian-11(config-router)#
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
'show bgp ... neighbor [routes|received-routes]' both incorrectly
used a json key of 'advertisedRoutes'.
This corrects the key to be 'receivedRoutes' for commands where
the displayed routes were received, not advertised.
before:
unet> r3 show ip bgp neigh 10.2.30.2 received-routes json | include Routes
"advertisedRoutes":{
after:
ub18# show ip bgp neighbors enp1s0 received-routes json | include Routes
"receivedRoutes":{
ub18# show ip bgp neighbors enp1s0 advertised-routes json | include Routes
"advertisedRoutes":{
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Used for graceful-restart mostly.
Especially for bgp_show_neighbor_graceful_restart_capability_per_afi_safi()
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
The bgp_notify_conditional_adv_scanner function was/is looping
over all peers. And only matching on the passed in peer,
based upon the subgroup. As such we do not need to loop
over everything and just cut-to-the chase and just modify
the peer structure.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Unsuppress route part of the aggregation when route-map configuration
is removed before the aggregation itself.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Description:
Change is intended for fixing the issue related to
clearing of stale leaked routes:
- Whenever BGP goes down,
after bringing down tcp connection and renegotiating capabilities,
once we reestablish connection,
we are not handling clear of VRF leaked route in the bgp_clear_stale_route.
- While bgp is clearing stale routes,
we need to handle withdraw of routes for VRF route-leaking.
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
Extended BGP Administrative Shutdown Communication (rfc9003):
Basically, shutdown message size is increased to 255 from 128.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Using with LLGR, this should be allowed setting GR restart-time timer to 0,
to immediately start LLGR timers.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
rfc7196 recommends:
In addition, BGP implementations have an internal constant, which we
will call the 'maximum penalty', and the current computed penalty may
not exceed it.
Router Maximum Penalty: The internal constant for the maximum
penalty value MUST be raised to at least 50,000.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
The following subcodes are defined for the Cease NOTIFICATION
message:
Subcode Symbolic Name
1 Maximum Number of Prefixes Reached
2 Administrative Shutdown
3 Peer De-configured
4 Administrative Reset
5 Connection Rejected
6 Other Configuration Change
7 Connection Collision Resolution
8 Out of Resources
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Currently, it is possible to rename the default VRF either by passing
`-o` option to zebra or by creating a file in `/var/run/netns` and
binding it to `/proc/self/ns/net`.
In both cases, only zebra knows about the rename and other daemons learn
about it only after they connect to zebra. This is a problem, because
daemons may read their config before they connect to zebra. To handle
this rename after the config is read, we have some special code in every
single daemon, which is not very bad but not desirable in my opinion.
But things are getting worse when we need to handle this in northbound
layer as we have to manually rewrite the config nodes. This approach is
already hacky, but still works as every daemon handles its own NB
structures. But it is completely incompatible with the central
management daemon architecture we are aiming for, as mgmtd doesn't even
have a connection with zebra to learn from it. And it shouldn't have it,
because operational state changes should never affect configuration.
To solve the problem and simplify the code, I propose to expand the `-o`
option to all daemons. By using the startup option, we let daemons know
about the rename before they read their configs so we don't need any
special code to deal with it. There's an easy way to pass the option to
all daemons by using `frr_global_options` variable.
Unfortunately, the second way of renaming by creating a file in
`/var/run/netns` is incompatible with the new mgmtd architecture.
Theoretically, we could force daemons to read their configs only after
they connect to zebra, but it means adding even more code to handle a
very specific use-case. And anyway this won't work for mgmtd as it
doesn't have a connection with zebra. So I had to remove this option.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Always free the locally allocated attribute not the one we are using for
return. This fixes a memory leak and a crash when AS Path is set with
route-map.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Currently the Wait for Install code ( bgp_suppress_fib ) does
not properly handle two states from zebra: ROUTE_INSTALL_FAILED
and BETTER_ADMIN_DISTANCE_WON. Pre this change the WFI code
would just never notify our peers about a route install failure
but more is needed. In the ROUTE_INSTALL_FAILED and the
BETTER_ADMIN_DISTANCE_WON we need to notify our peers with
a withdrawal about the route, else we will continue to
draw traffic to us when we cannot legally do so.
Why is this needed? In either case imagine that we've already
received a bgp route, installed it and sent to our peers.
In the Better admin distance won case, say a static route is installed
at this point in time we must stop advertising the route through
us since we are not installed. As such a withdrawal must be sent.
In the ROUTE_INSTALL_FAILED case, the code was not properly handling
the situation where we have Route A, it was successfully installed
and then we received a update to Route A that was attempted to be
installed but failed. In this case we also need to send a withdrawal
Finally update the bgp_suppress_fib topotest to test both of these
situations.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If soft-reconfiguration is enabled, bgp_adj_in_set will be called
from bgp_update and bgp_adj_in_set will call bgp_attr_intern to intern
attr pointer. If given attr isn't found in attrhash, hash_get will call
bgp_attr_hash_alloc to allocate new attr structure. In
bgp_attr_hash_alloc, NULL will be assigned to srv6_vpn field and
srv6_l3vpn field in origin attr pointer. attr->srv6_vpn and
attr->srv6_l3vpn are interned in bgp_attr_intern, so NULL assignment
isn't needed.
And, these fields are used later in bgp_update to set SRv6 information
to bgp_path_info. If bgp_attr_hash_alloc assign NULL to these fields,
SRv6 information will be lost and incorrect routes are inserted into
data-plane.
Signed-off-by: Ryoga Saito <contact@proelbtn.com>
Don't hide the LABELED_UNICAST safi when processing route
updates; map it where necessary (to use the UNICAST table
for instance).
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
Move the "longer-prefixes" option from show_ip_bgp_cmd to
show_ip_bgp_json_cmd so that is has access to JSON output.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Move the "route-map" option from show_ip_bgp_cmd to
show_ip_bgp_json_cmd so that is has access to JSON output.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Move the "filter-list" option from show_ip_bgp_cmd to
show_ip_bgp_json_cmd so that is has access to JSON output.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Move the "prefix-list" option from show_ip_bgp_cmd to
show_ip_bgp_json_cmd so that is has access to JSON output.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Move the "community-list" option from show_ip_bgp_cmd to
show_ip_bgp_json_cmd so that is has access to JSON output.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
There's no need to have different calls to bgp_show() when the only
difference is one argument that corresponds to a "void *" parameter.
Code duplication should be reduced to a minimum to avoid bugs like
the one fixed in the previous commit.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Like done in the other places (when "all" isn't used), pass the
actual alias name to bgp_show() instead of a null pointer.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Duplicate a couple of definitions in order to remove the bgpd
includes from this libfrr header. This is necessary to fix some
name collisions like PREFIX_LIST_IN being defined differently on
multiple daemons (as soon as other daemons start including
route_opaque.h).
Including daemon headers on libfrr headers is a bad practice and
should be avoided whenever possible.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
add a parameter to resolver api that is the vrf identifier. this permits
to make resolution self to each vrf. in case vrf netns backend is used,
this is very practical, since resolution can happen on one netns, while
it is not the case in an other one.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Since f60a1188 we store a pointer to the VRF in the interface structure.
There's no need anymore to store a separate vrf_id field.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
For IPv4 matching, we have "match ip next-hop address A.B.C.D".
For IPv6 matching, we have "match ipv6 next-hop X:X::X:X".
To have consistency, let's add "address" keyword to IPv6 commands.
Old commands are preserved as hidden for backward compatibility.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Description:
EVPN routes marked as imported routes,
having bgp path info's extra, where as
they are not truly imported routes,
so original bgp info will be null.
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
Description:
Added a macro which optimises some part of the code.
Co-authored-by: Santosh P K <sapk@vmware.com>
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
Description:
Incorrect behavior during best path selection for the imported routes.
Imported routes are always treated as eBGP routes.
Change is intended for fixing the issues related to
bgp best path selection for leaked routes:
- FRR does ecmp for the imported routes,
even without any ecmp related config.
If the same prefix is imported from two different VRFs,
then we configure the route with ecmp even without
any ecmp related config.
- Locally imported routes are preferred over imported
eBGP routes.
If there is a local route and eBGP learned route
for the same prefix, if we import both the routes,
imported local route is selected as best path.
- Same route is imported from multiple tenant VRFs,
both imported routes point to the same VRF in nexthop.
- When the same route with same nexthop in two different VRFs
is imported from those two VRFs, route is not installed as ecmp,
even though we had ecmp config.
- During best path selection, while comparing the paths for imported routes,
we should correctly refer to the original route i.e. the ultimate path.
- When the same route is imported from multiple VRF,
use the correct VRF while installing in the FIB.
- When same route is imported from two different tenant VRFs,
while comparing bgp path info as part of bgp best path selection,
we should ideally also compare corresponding VRFs.
See-also: https://github.com/FRRouting/frr/files/7169555/FRR.and.Cisco.VRF-Lite.Behaviour.pdf
Co-authored-by: Santosh P K <sapk@vmware.com>
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
the fix consists in parsing the ext community list ipv6 by taking
account the size of the ecommunity val size.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
We should always treat the VRF interface as a loopback. Currently, this
is not the case, because in some old pre-VRF code we use if_is_loopback
instead of if_is_loopback_or_vrf. To avoid any future problems, the
proposal is to rename if_is_loopback_or_vrf to if_is_loopback and use it
everywhere. if_is_loopback is renamed to if_is_loopback_exact in case
it's ever needed, but currently it's not used anywhere.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
We should send only 16bytes next hop, no need for 32bytes, third party
next hops kinda for LLA does not work here.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
When debugging issues for routes in multiple vrf's. It would
be extremely useful if the debug output had which vrf we
are acting on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When "update-source IFNAME" is used for the neighbor, p->update_source
is set to NULL, so we can't use it as a source address and should use
the address from p->su_local.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
We had various forms of min/max macros across multiple daemons
all of which duplicated what we have in compiler.h. Convert
everyone to use the `correct` ones
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
AFI/SAFI is handled in bgp_vty_find_and_parse_afi_safi_bgp() properly for
IPv4, but not for IPv6. Let's have it enabled for IPv6 by default.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
AFI/SAFI is handled in bgp_vty_find_and_parse_afi_safi_bgp() properly for
IPv4, but not for IPv6. Let's have it enabled for IPv6 by default.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Update BFD sessions when the update-source configuration is set so the
session follows the new configured source address.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When altering the TTL of a eBGP peer also update the BFD
configuration. This was only working when the configuration happened
after the peer connection had been established.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
if_lookup_by_index_all_vrf doesn't work correctly with netns VRF backend
as the same index may be used in multiple netns simultaneously.
We always know the BGP instance we work with, so use its VRF id for the
interface lookup.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
```
exit1-debian-9(config-route-map)# match ip route-source prefix-list ?
<cr>
PREFIXLIST_NAME IP prefix-list name
p1 p2
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Before:
```
192.168.10.17 OPEN has MultiProtocol Extensions capability (1), length 4
192.168.10.17 OPEN has MP_EXT CAP for afi/safi: IPv4/unicast
```
After:
```
192.168.10.17 OPEN has MultiProtocol Extensions capability (1), length 4
192.168.10.17 OPEN has MultiProtocol Extensions capability for afi/safi: IPv4/unicast
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
When removing the command `no neighbor <X> ebgp-multihop <Y>`
the bgp code was always resetting the connection even if
the command would do nothing.
Fixes: #6464
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
```
exit1-debian-9(config)# route-map test1 permit 10
exit1-debian-9(config-route-map)# match community ?
(1-99) Community-list number (standard)
(100-500) Community-list number (expanded)
COMMUNITY_LIST_NAME Community-list name
testas
exit1-debian-9(config-route-map)# match large-community ?
(1-99) Large Community-list number (standard)
(100-500) Large Community-list number (expanded)
LCOMMUNITY_LIST_NAME Large Community-list name
LCL-ORIGINATED-ALL
exit1-debian-9(config-route-map)#
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
This removes a giant `switch { }` block from lib/zclient.c and
harmonizes all zclient callback function types to be the same (some had
a subset of the args, some had a void return, now they all have
ZAPI_CALLBACK_ARGS and int return.)
Apart from getting rid of the giant switch, this is a minor security
benefit since the function pointers are now in a `const` array, so they
can't be overwritten by e.g. heap overflows for code execution anymore.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Dynamic peer count is inconsistent in
"show bgp summary json" and "show bgp summary failed json" due to
dynamic peer counter 'dn_count' being reused without resetting
Signed-off-by: Abhishek Naik <bhini@amazon.com>