There is no support for option 8, as per RFC7854.
Add the 64 bit counter in the peer structure.
Add the missing per peer statistic.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There is no support for option 7, as per RFC7854.
Add the 64 bit counter in the peer structure.
Add the 64 bit bmp value write api.
Add the missing per peer statistic.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
SRTE_COLOR is not defined at all as an attribute, it was a mistake from the
beginning.
SRTE_COLOR is extended community, can't see the reason having it as a community,
and a separate attribute.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Current changes deals with EVPN routes installation to zebra.
In evpn_route_select_install() we invoke evpn_zebra_install/uninstall
which sends zclient_send_message().
This is a continuation of code changes (similar to
ccfe452763) but to handle evpn part
of the code.
Ticket: #3390099
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
BGP is now keeping a list of dests with the dest having a pointer
to the bgp_path_info that it will be working on.
1) When bgp receives a prefix, process it, add the bgp_dest of the
prefix into the new Fifo list if not present, update the flags (Ex:
earlier if the prefix was advertised and now it is a withdrawn),
increment the ref_count and DO NOT advertise the install/withdraw
to zebra yet.
2) Schedule an event to wake up to invoke the new function which will
walk the list one by one and installs/withdraws the routes into zebra.
a) if BUFFER_EMPTY, process the next item on the list
b) if BUFFER_PENDING, bail out and the callback in
zclient_flush_data() will invoke the same function when BUFFER_EMPTY
Changes
- rename old bgp_zebra_announce to bgp_zebra_announce_actual
- rename old bgp_zebra_withdrw to bgp_zebra_withdraw_actual
- Handle new fifo list cleanup in bgp_exit()
- New funcs: bgp_handle_route_announcements_to_zebra() and
bgp_zebra_route_install()
- Define a callback function to invoke
bgp_handle_route_announcements_to_zebra() when BUFFER_EMPTY in
zclient_flush_data()
The current change deals with bgp installing routes via
bgp_process_main_one()
Ticket: #3390099
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Modify the bgp master to hold a type safe list for bgp_dests that need
to be passed to zebra.
Future commits will use this.
Ticket: #3390099
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Dynamic capability provides more value without resetting the sessions for some
important other capabilities to exchange, like: graceful-restart, addpath, orf,
fqdn, etc.
Since we support it already, enable it by default.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
currently:
when as-path-loop-detection is set on a peer-group.
members of the peer-group are not using that functionnality.
analysis:
the as-path-loop-detection, is not using the peer's flags
related framework.
fix:
use the peer's flag framework for as-path-loop-detection.
Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
This is required by the current (latest/-02 draft).
IANA has registered code 8 for "Send Hold Timer Expired" in the "BGP
Error (Notification) Codes" sub-registry under the "Border Gateway
Protocol (BGP) Parameters" registry.
https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-sendholdtimer
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
By default, iBGP and eBGP-OAD peers exchange RPKI extended community by default.
Add a command to disable sending RPKI extended community if needed.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
cisco routers are not dealing fairly whith unsupported capabilities.
When a cisco router receive an unsupported capabilities it reset the
negociation without notifying the unmatching capability as described in
RFC2842.
Cisco suggest the use of
neighbor x.x.x.x capability fqdn
to avoid the use of fqdn in open message.
this new command is to remove the use of fqdn capability in the
open message with the peer "x.x.x.x".
Link: https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/116189-problemsolution-technology-00.pdf
Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
Create a single registry of default port values that daemons
are using. Most of these are vty ports, but there are some
others for features like ospfapi and zebra FPM.
Signed-off-by: Mark Stapp <mjs@labn.net>
Show per VRF RPKI configuration in "show run".
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add a hook to call a future callback function when bgpd knows from zebra
about the activation of de-activation of a VRF. It will be used by the
RPKI module in next commits.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
When a peer has come up and already started installing
routes into the rib and `suppress-fib-pending` is either
turned on or off. BGP is left with some routes that
may need to be withdrawn from peers and routes that
it does not know the status of. Clear the BGP peers
for the interesting parties and let's let us come
up to speed as needed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Implement proper memory cleanup for SRv6 functions and locator chunks to prevent potential memory leaks.
The list callback deletion functions have been set.
The ASan leak log for reference:
```
***********************************************************************************
Address Sanitizer Error detected in bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.asan.bgpd.4180
=================================================================
==4180==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 544 byte(s) in 2 object(s) allocated from:
#0 0x7f8d176a0d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x7f8d1709f238 in qcalloc lib/memory.c:105
#2 0x55d5dba6ee75 in sid_register bgpd/bgp_mplsvpn.c:591
#3 0x55d5dba6ee75 in alloc_new_sid bgpd/bgp_mplsvpn.c:712
#4 0x55d5dba6f3ce in ensure_vrf_tovpn_sid_per_af bgpd/bgp_mplsvpn.c:758
#5 0x55d5dba6fb94 in ensure_vrf_tovpn_sid bgpd/bgp_mplsvpn.c:849
#6 0x55d5dba7f975 in vpn_leak_postchange bgpd/bgp_mplsvpn.h:299
#7 0x55d5dba7f975 in vpn_leak_postchange_all bgpd/bgp_mplsvpn.c:3704
#8 0x55d5dbbb6c66 in bgp_zebra_process_srv6_locator_chunk bgpd/bgp_zebra.c:3164
#9 0x7f8d1716f08a in zclient_read lib/zclient.c:4459
#10 0x7f8d1713f034 in event_call lib/event.c:1974
#11 0x7f8d1708242b in frr_run lib/libfrr.c:1214
#12 0x55d5db99d19d in main bgpd/bgp_main.c:510
#13 0x7f8d160c5c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Direct leak of 296 byte(s) in 1 object(s) allocated from:
#0 0x7f8d176a0d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x7f8d1709f238 in qcalloc lib/memory.c:105
#2 0x7f8d170b1d5f in srv6_locator_chunk_alloc lib/srv6.c:135
#3 0x55d5dbbb6a19 in bgp_zebra_process_srv6_locator_chunk bgpd/bgp_zebra.c:3144
#4 0x7f8d1716f08a in zclient_read lib/zclient.c:4459
#5 0x7f8d1713f034 in event_call lib/event.c:1974
#6 0x7f8d1708242b in frr_run lib/libfrr.c:1214
#7 0x55d5db99d19d in main bgpd/bgp_main.c:510
#8 0x7f8d160c5c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
***********************************************************************************
```
Signed-off-by: Keelan Cannoo <keelan.cannoo@icloud.com>
When exporting BGP prefixes, it is necessary to configure
the route target extended communities with the following
command:
> rt vpn export <RouteTarget>
But the customer may need to configure the route-target to
apply to bgp updates, solely based on a route-map criterium.
by using the below route-map configured like that:
> route-map vpn export <routemapname>
Fix this by allowing to export bgp updates based on the
presence of route-targets on either route-map or vpn
configured rt. the exportation process is stopped
if no route target is available in the ecommunity list.
Fixes: ddb5b4880b ("bgpd: vpn-vrf route leaking")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
It's been for a while disabled by default, but this seems reasonable to flip it.
We had `bgp enforce-first-as` as a global BGP knob to enable/disable this
behavior globally, later we introduced `enforce-first-as` per neighbor, with disabled
by default. Now let's enable this by default by bringing a global `bgp enforce-first-as`
command back.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
In zebra, the import check table and the nexthop check tables
were combined. This leaves an issue where when bgp happens
to have a tracked address in both the import check table
and the nexthop track table that are the same address.
When the the item is removed from one table the call
to remove it from zebra removes tracking for the other
table.
Combine the two tables together and keep track where
they came from for processing in bgpd.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There is no command to choose to send or not the bgp4-mibv2 traps.
Since the MIB bgp4-mibv2 notification are redundant with MIB RFC4273
we added a command:
- [no] bgp snmp traps bgp4-mibv2
By default, the bgp4-mibv2 traps will be disabled, to prevent from
redundancy.
Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
There is no cli command to prevent the router to send traps
implemented in the rfc4273. If not done, when introducing
the traps from bgp4v2mib, traps will be send for each of
the two mibs: there will be redundancy in the sent information.
Add a new command:
- [no] bgp snmp traps rfc4273
Using this command will allow or not the notification of
the following traps:
- bgpEstablishedNotification
- bgpBackwardTransNotification
Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
For instance, it's not possible to resend FQDN capability without resetting
the session, so let's create some more elegant way to do that.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Today, when configuring BGP L3VPN mpls, the operator may
use that command to hardset a label value:
> router bgp 65500 vrf vrf1
> address-family ipv4 unicast
> label vpn export <hardset_label_value>
Today, BGP uses this value without checks, leading to potential
conflicts with other control planes like LDP. For instance, if
LDP initiates with a label chunk of [16;72] and BGP also uses the
50 label value, a conflict arises.
The 'label manager' service in zebra oversees label allocations.
While all the control plane daemons use it, BGP doesn't when a
hardset label is in place.
This update fixes this problem. Now, when a hardset label is set for
l3vpn export, a request is made to the label manager for approval,
ensuring no conflicts with other daemons. But, this means some existing
BGP configurations might become non-operational if they conflict with
labels already allocated to another daemon but not used.
note: Labels below 16 are reserved and won't be checked for consistency
by the label manager.
Fixes: ddb5b4880b ("bgpd: vpn-vrf route leaking")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
At each EBGP boundary, BGP path attributes are modified as per [RFC4271], which includes stripping any IBGP-only attributes.
Some networks span more than one autonomous system and require more flexibility in the propagation of path attributes. It is worth noting that these multi-AS networks have a common or single administrative entity. These networks are said to belong to One Administrative Domain (OAD). It is desirable to carry IBGP-only attributes across EBGP peerings when the peers belong to an OAD.
This document defines a new EBGP peering type known as EBGP-OAD, which is used between two EBGP peers that belong to an OAD. This document also defines rules for route announcement and processing for EBGP-OAD peers.
https://datatracker.ietf.org/doc/html/draft-uttaro-idr-bgp-oad
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Add the ability to store a raw copy of the incoming BGP Link-State
attributes and to redistribute them as is to other routes.
New types of data BGP_ATTR_LS and BGP_ATTR_LS_DATA are defined.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
https://www.rfc-editor.org/rfc/rfc2042.html
says: 255 reserved for development
In FRR, 255 is kinda used too BGP_ATTR_VNC, even more we allow setting 255 in CLI.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
```
donatas-pc(config-router)# timers bgp 8 12
% keeplive value 8 is larger than 1/3 of the holdtime, setting to 4
donatas-pc(config-router)# do sh run | include timers bgp
timers bgp 4 12
donatas-pc(config-router)#
```
Closes https://github.com/FRRouting/frr/issues/14287
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This is based on @donaldsharp's work
The current code base is the struct bgp_node data structure.
The problem with this is that it creates a bunch of
extra data per route_node.
The table structure generates ‘holder’ nodes
that are never going to receive bgp routes,
and now the memory of those nodes is allocated
as if they are a full bgp_node.
After splitting up the bgp_node into bgp_dest and route_node,
the memory of ‘holder’ node which does not have any bgp data
will be allocated as the route_node, not the bgp_node,
and the memory usage is reduced.
The memory usage of BGP node will be reduced from 200B to 96B.
The total memory usage optimization of this part is ~16.00%.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>
As part of the conversion to a `struct peer_connection` it will
be desirable to have 2 pointers one for when we open a connection
and one for when we receive a connection. Start this actual
conversion over to this in `struct peer`. If this sounds confusing
take a look at the bgp state machine for connections and how
it resolves the processing of this router opening -vs- this
router receiving an open. At some point in time the state
machine decides that we are keeping one of the two connections.
Future commits will allow us to untangle the peer/doppelganger
duality with this abstraction.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The status and ostatus are a function of the `struct peer_connection`
move it into that data structure.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Move PEER_THREAD_WRITES_ON and PEER_THREAD_READS_ON to
be a part of the `struct peer_connection` since this is
a connection oriented bit of data.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Move the peer->t_write and peer->t_read into `struct peer_connection`
as that these are properties of the connection.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
P# Please enter the commit message for your changes. Lines starting
BGP tracks connections based upon the peer. But the problem
with this is that the doppelganger structure for it is being
created. This has introduced a bunch of fragileness in that
the peer exists independently of the connections to it.
The whole point of the doppelganger structure was to allow
BGP to both accept and initiate tcp connections and then
when we get one to a `good` state we collapse into the
appropriate one. The problem with this is that having
2 peer structures for this creates a situation where
we have to make sure we are configing the `right` one
and also make sure that we collapse the two independent
peer structures into 1 acting peer. This makes no sense
let's abstract out the peer into having 2 connection
one for incoming connections and one for outgoing connections
then we can easily collapse down without having to do crazy
stuff. In addition people adding new features don't need
to have to go touch a million places in the code.
This is the start of this abstraction. In this commit
we'll just pull out the fd and input/output buffers
into a connection data structure. Future commits
will abstract further.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The last_reset_cause is a plain old BGP_MAX_PACKET_SIZE buffer
that is really enlarging the peer data structure. Let's just
copy the stream that failed and only allocate how ever much
the packet size actually was. While it's likely that we have
a reset reason, the packet typically is not going to be 65k
in size. Let's save space.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The peer->ibuf_scratch was allocating 65535 * 10 bytes
for scratch space to hold data incoming from a read
from a peer. When you have 4k peers this is 262,1400,000
or 262 mb of data. Which is crazy large. Especially
since the i/o pthread is reading per peer without
any chance of having the data interfere with other reads.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Useful to have it for datacenter profile only, disabled for traditional.
If the peer is not established or established, but has no description set,
we will show the FRR version instead, which is kinda handy to have instead of
nothing.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
More details: https://www.rfc-editor.org/rfc/rfc8810.html
Not sure if we want to maintain the old code more.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Both the label manager and table manager zapi code send data requests via zapi
to zebra and then immediately listen for a response from zebra. The problem here
is of course that the listen part is throwing away any zapi command that is not
the one it is looking for.
ISIS/OSPF and PIM all have synchronous abilities via zapi, which they all
do through a special zapi connection to zebra. BGP needs to follow this model
as well. Additionally the new zclient_sync connection that should be created,
a once a second timer should wake up and read any data on the socket to
prevent problems too much data accumulating in the socket.
```
r3# sh bgp labelpool summary
Labelpool Summary
-----------------
Ledger: 3
InUse: 3
Requests: 0
LabelChunks: 1
Pending: 128
Reconnects: 1
r3# sh bgp labelpool inuse
Prefix Label
---------------------------
10.0.0.1/32 16
192.168.31.0/24 17
192.168.32.0/24 18
r3#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
In the context of the ASBR facing an EBGP neighbor, or
facing an IBGP neighbor where the BGP updates received
are re-advertised with a modified next-hop, a new local
label will be re-advertised too, to replace the original
one.
Create a binding table, in the form of a hash list, from the
original labels to the new labels. Since labels can be the
same on several routers, set the next-hop and the label as
the keys. Add the needed API functions to manage the hash
list.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When acting as intermediate device for BGP signaling, and
as transit device for data traffic, the device is not able
to modify the label value from incoming MPLS VPN updates:
- as BGP device, modifying the label value is necessary
when redistributing VPN prefixes with its own next-hop.
- as transit device that connects two ethernet segments
on separate interfaces, the return MPLS traffic must be
handled: the modified label value must be swapped with
the original label value and sent back to the original
next-hop.
The border router use case can be taken as example, when
it acts both as transit and as BGP device:
- When receiving updates from a border router peer, and where
interior traffic is expected to transit through the local
border router.
- When receiving updates from interior devices, and where
exterior traffic will transit through the local border router.
In those two situations, a new label is bound to the received
entry, and the entry is advertised to a new peer with the new
label. In the same time, an MPLS entry is created to handle
return traffic with the new mpls label: the traffic would be
swapped to the original MPLS label and the original next-hop.
This is the first commit of a series of patches, that address
the above mentioned issue.
The first commit introduces a new per-interface command:
> interface eth0
> [no] mpls bgp l3vpn-multi-domain-switching
> exit
This command will authorise mpls vpn updates to have a new
label value bound to the mpls vpn routes received over that
interface.
Link: https://www.rfc-editor.org/rfc/rfc3107.html#section-3
> When a BGP speaker redistributes a route, the label(s) assigned to
> that route must not be changed (except by omission), unless the
> speaker changes the value of the Next Hop attribute of the route.
Link: https://www.rfc-editor.org/rfc/rfc3031.html#section-4.6
Link: https://www.rfc-editor.org/rfc/rfc4364.html#section-10
sub-chapter b.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When using `addpath-tx-all` BGP announces all known paths instead of announcing
only an arbitrary number of best paths.
With this new command we can send N best paths to the neighbor. That means, we
send the best path, then send the second best path excluding the previous one,
and so on. In other words, we run best path selection algorithm N times before
we finish.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Currently we have a handler function that will walk the global EVPN
rib and unimport/remove routes matching a local IP/TIP. This generalizes
this function so that it can be re-used for other BGP Martian entry
types. Now this can be used to unimport routes when the MAC-VRF SoO is
reconfigured.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
This commit introduces the necessary structs and apis to
create the cache entries that store the label information
associated to a given nexthop.
A hash table is created in each BGP instance for all the
AFIs: IPv4 and IPv6. That hash table is initialised.
An API to look and/or create an entry based on a given
nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>