Delay BGP configuration until we receive end-configuration hook to make sure
we don't send partial updates to peer which leads to broken Graceful-Restart.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
For the later patches, this patch changes the behavior of alloc_new sid
so that bgpd record not only SID for VRF, but also Locator of SID.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
Conversion of bgp error codes returned for cli input into
an enum and then properly handling all the error cases
in bgp_vty_return.
Because not all error codes returned were properly handled
in this function there existed configuration examples that
were accepted on the cli without an error message but not
saved.
Fixes: #10589
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
confederations are checking to see that the bgp pointer
is non-null. But it's impossible to have a null pointer
in the cli and in all paths we have already deref'ed the bgp
pointer. Let's remove that error code as that it is impossible
to happen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a 15 minute warning to the logging system when
bgp policy is not setup properly. Operators keep asking
about the missing policy( on upgrade typically ). Let's
try to give them a bit more of a hint when something is
going wrong as that they are clearly missing the other
various places FRR tells them about it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When setting maximum-prefix-out on peer-group, the applied value on
member is 0.
Fix usage of maximum-prefix-out on peer-group.
The peer_maximum_prefix_out_(un)set functions are derived from
peer_maximum_prefix_(un)set.
Fixes: fde246e835 ("bgpd: Add an option to limit outgoing prefixes")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Abstract:
- The command "neighbor PEER maximum-prefix-out NUMBER" cannot be applied
without clearing the BGP neighbor.
- Apply the maximum-prefix-out value as soon as it is modified without
clearing the neighbor.
subgroup_update_packet() and subgroup_withdraw_packet() respectively
manages the announcement and withdrawal BGP message to the peer.
subgrp->scount counter counts the number of sent prefixes.
Before the patch, the maximum out prefix limitation was applied in
subgroup_update_packet() in order that subgrp->scount never exceeds the
limit. Setting a limit inferior to the effective number of sent prefix
did not result in sending any withdrawal message to reduce the number of
sent prefixes. Without clearing the BGP neighbor, the limitation only
applied to the announcement of new prefixes when the limitation was
over.
With the patch, the limitation is checked in subgroup_announce_check().
The function is intended to say whether a prefix has to be announced in
regards to the prefix-list, route-map... Now when a maximum-prefix-out
value is changed/removed, the neighbor AFI/SAFI table is re-parsed in
the same way as for the application of route-map, prefix-lists...
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Extended BGP Administrative Shutdown Communication (rfc9003):
Basically, shutdown message size is increased to 255 from 128.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
We had various forms of min/max macros across multiple daemons
all of which duplicated what we have in compiler.h. Convert
everyone to use the `correct` ones
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The idea is to disable addpath-rx capability to avoid unnecessary additional
routes installed.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
BGP can experience a bunch of errors associated with sockets
being manipulated which would prevent the peer from coming up.
Let's add some additional debug information here so that
our operators can do a bit more for themselves.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Current implementation of SRv6 SID allocation algorithm sets most least
2 bytes. But, according to RFC8986, function bits is located in the next
to locator. New allocation alogirithm respects this format.
Signed-off-by: Ryoga Saito <contact@proelbtn.com>
This is to avoid breaking changes between existing deployments of
extended community for bandwidth encoding. By default FRR uses uint32
to encode bandwidth, which is not as the draft requires (IEEE floating-point).
This switch enables the required encoding per-peer.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
When BGP is notified by RIB that peer address is unreachable then BGP session must be brought
down immediately and not wait for the hold-timer expiry. Today single-hop EBGP already behaves
this way but need to change for iBGP and multi-hop EBGP sessions.
Signed-off-by: Prerana G.B <prerana@vmware.com>, Pushpasis Sarkar <spushpasis@vmware.com>
in bgp_io.c upon packet read of some error we are storing
the peer pointer on a thread to call bgp_packet_process_error.
In this case an event is generated that is not guaranteed to be
run immediately. It could come in *after* the peer data structure
is deleted and as such we now are writing into memory that we
no longer possibly own as a peer data structure.
Modify the code so that the peer can track the thread associated
with the read error and then it can wisely kill that thread
when deleting the peer data structure.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Adds a knob that sets the time between loc-rib scans for conditional
advertisement.
I chose the range (5-240) because 1 second seems dumb and too easy to
hurt yourself at even moderate scale, 5 seconds you can still hurt
yourself but I could see a use case for it, and 4 minutes should be
enough for anyone (tm)
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Introduces bgp->default_af to selectively enable various default
afi/safis to be inherited by new peers.
Makes default_af flag logic consistent for all address-families, i.e.
instead of a "no default" flag for ipv4 and a "default" flag for ipv6,
just use "default" for both and make it true for ipv4 by default.
Removes old BGP_FLAG_NO_DEFAULT_IPV4 and BGP_FLAG_DEFAULT_IPV6, and
cleans up bgp->flags bit definitions to avoid gaps for unused bits.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Gateway IP overlay index of the remote type-5 route is resolved
recursively using remote type-2 route. For the purpose of this
recursive resolution, for each L2VNI, we build a hash table of the
remote IP addresses received by remote type-2 routes.
For the topologies where overlay index resolution is not needed, we
do not need to build this remote-ip-hash.
Thus, make the recursive resolution of the overlay index conditional on
"enable-resolve-overlay-index" configuration.
router bgp 65001
bgp router-id 192.168.100.1
neighbor 10.0.1.2 remote-as 65002
!
address-family l2vpn evpn
neighbor 10.0.1.2 activate
advertise-all-vni
enable-resolve-overlay-index----------> New configuration
exit-address-family
Gateway IP overlay index will be resolved only if this configuration is present.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
When EVPN prefix route with a gateway IP overlay index is imported into the IP
vrf at the ingress PE, BGP nexthop of this route is set to the gateway IP.
For this vrf route to be valid, following conditions must be met.
- Gateway IP nexthop of this route should be L3 reachable, i.e., this route
should be resolved in RIB.
- A remote MAC/IP route should be present for the gateway IP address in the
EVI(L2VPN table).
To check for the first condition, gateway IP is registered with nht (nexthop
tracking) to receive the reachability notifications for this IP from zebra RIB.
If the gateway IP is reachable, zebra sends the reachability information (i.e.,
nexthop interface) for the gateway IP.
This nexthop interface should be the SVI interface.
Now, to find out type-2 route corresponding to the gateway IP, we need to fetch
the VNI for the above SVI.
To do this VNI lookup effitiently, define a hashtable of struct bgpevpn with
svi_ifindex as key.
struct hash *vni_svi_hash;
An EVI instance is added to vni_svi_hash if its svi_ifindex is nonzero.
Using this hash, we obtain struct bgpevpn corresponding to the gateway IP.
For gateway IP overlay index recursive lookup, once we find the correct EVI, we
have to lookup its route table for a MAC/IP prefix. As we have to iterate the
entire route table for every lookup, this lookup is expensive. We can optimize
this lookup by adding all the remote IP addresses in a hash table.
Following hash table is defined for this purpose in struct bgpevpn
Struct hash *remote_ip_hash;
When a MAC/IP route is installed in the EVI table, it is also added to
remote_ip_hash.
It is possible to have multiple MAC/IP routes with the same IP address because
of host move scenarios. Thus, for every address addr in remote_ip_hash, we
maintain list of all the MAC/IP routes having addr as their IP address.
Following structure defines an address in remote_ip_hash.
struct evpn_remote_ip {
struct ipaddr addr;
struct list *macip_path_list;
};
A Boolean field is added to struct bgp_nexthop_cache to indicate that the
nexthop is EVPN gateway IP overlay index.
bool is_evpn_gwip_nexthop;
A flag BGP_NEXTHOP_EVPN_INCOMPLETE is added to struct bgp_nexthop_cache.
This flag is set when the gateway IP is L3 reachable but not yet resolved by a
MAC/IP route.
Following table explains the combination of L3 and L2 reachability w.r.t.
BGP_NEXTHOP_VALID and BGP_NEXTHOP_EVPN_INCOMPLETE flags
* | MACIP resolved | MACIP unresolved
*----------------|----------------|------------------
* L3 reachable | VALID = 1 | VALID = 0
* | INCOMPLETE = 0 | INCOMPLETE = 1
* ---------------|----------------|--------------------
* L3 unreachable | VALID = 0 | VALID = 0
* | INCOMPLETE = 0 | INCOMPLETE = 0
Procedure that we use to check if the gateway IP is resolvable by a MAC/IP
route:
- Find the EVI/L2VRF that belongs to the nexthop SVI using vni_svi_hash.
- Check if the gateway IP is present in remote_ip_hash in this EVI.
When the gateway IP is L3 reachable and it is also resolved by a MAC/IP route,
unset BGP_NEXTHOP_EVPN_INCOMPLETE flag and set BGP_NEXTHOP_VALID flag.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
Adds gateway-ip option to advertise ipv4/ipv6 unicast CLI.
dev(config-router-af)# advertise <ipv4|ipv6> unicast
<cr>
gateway-ip Specify EVPN Overlay Index
route-map route-map for filtering specific routes
When gateway-ip is specified, gateway IP field of EVPN RT-5 NLRI is filled with
the BGP nexthop of the vrf prefix being advertised.
No support for ESI overlay index yet.
Test cases:
1) advertise ipv4 unicast
2) advertise ipv4 unicast gateway-ip
3) advertise ipv6 unicast
4) advertise ipv6 unicast gateway-ip
5) Modify from no-overlay-index to gateway-ip
6) Modify from gateway-ip to no-overlay-index
7) CLI with route-map and modify route-map
Author: Sri Mohana Singamsetty <srimohans@gmail.com>
Signed-off-by: Sri Mohana Singamsetty <srimohans@gmail.com>
We are inconsistently using peer_establiahed(peer) with
sometimes using `peer->status == Established`. Just Convert
over to using the function for consistency.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This commit add base-lines for BGP SRv6 VPN support.
srv6_locator_chunks property of struct bgp is used
to store BGPd's own SRv6 locator chunk getting with
ZEBRA_SRV6_MANAGER_GET_LOCATOR_CHUNK api.
And srv6_functions is used to store BGP's srv6
localsids. It's mainly used when new SID reservation
from locator chunks.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
base
BGP_MAX_PACKET_SIZE no longer represented the absolute maximum BGP
packet size as it did before, instead it was defined as 4096 bytes,
which is the maximum unless extended message capability is negotiated,
in which case the maximum goes to 65k.
That introduced at least one bug - last_reset_cause was undersized for
extended messages, and when sending an extended message > 4096 bytes
back to a peer as part of NOTIFY data would trigger a bounds check
assert.
This patch redefines the macro to restore its previous meaning,
introduces a new macro - BGP_STANDARD_MESSAGE_MAX_PACKET_SIZE - to
represent the 4096 byte size, and renames the extended size to
BGP_EXTENDED_MESSAGE_MAX_PACKET_SIZE for consistency. Code locations
that definitely should use the small size have been updated, locations
that semantically always need whatever the max is, no matter what that
is, use BGP_MAX_PACKET_SIZE.
BGP_EXTENDED_MESSAGE_MAX_PACKET_SIZE should only be used as a constant
when storing what the negotiated max size is for use at runtime and to
define BGP_MAX_PACKET_SIZE. Unless there is a future standard that
introduces a third valid size it should not be used for any other
purpose.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>