Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
A new VTY command is introduced in ipv4 unicast and ipv6 unicast
address family, under a BGP instance.
> r1# label vpn export allocation-mode per-nexthop|per-vrf
This command will update the label values associated for each
BGP update to export to the global instance. Two modes are
available: per-nexthop and per-vrf. The latter is the default
one.
With this commit only, configuring label allocation per nexthop
will only reset the BGP updates, and the per-vrf mode label
allocation will be chosen.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
It was hard to catch those unless using higher values than uint32_t, but
already hit, it's time to fix completely.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Since we increased peer->af_flags from uint32_t to uint64_t,
peer_af_flag_check() was historically returning integer, and not bool
as should be.
The bug was that if we have af_flags higher than uint32_t it will never
returned a right value.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Implement: https://datatracker.ietf.org/doc/html/draft-abraitis-bgp-version-capability
Tested with GoBGP:
```
% ./gobgp neighbor 192.168.10.124
BGP neighbor is 192.168.10.124, remote AS 65001
BGP version 4, remote router ID 200.200.200.202
BGP state = ESTABLISHED, up for 00:01:49
BGP OutQ = 0, Flops = 0
Hold time is 3, keepalive interval is 1 seconds
Configured hold time is 90, keepalive interval is 30 seconds
Neighbor capabilities:
multiprotocol:
ipv4-unicast: advertised and received
ipv6-unicast: advertised
route-refresh: advertised and received
extended-nexthop: advertised
Local: nlri: ipv4-unicast, nexthop: ipv6
UnknownCapability(6): received
UnknownCapability(9): received
graceful-restart: advertised and received
Local: restart time 10 sec
ipv6-unicast
ipv4-unicast
Remote: restart time 120 sec, notification flag set
ipv4-unicast, forward flag set
4-octet-as: advertised and received
add-path: received
Remote:
ipv4-unicast: receive
enhanced-route-refresh: received
long-lived-graceful-restart: advertised and received
Local:
ipv6-unicast, restart time 10 sec
ipv4-unicast, restart time 20 sec
Remote:
ipv4-unicast, restart time 0 sec, forward flag set
fqdn: advertised and received
Local:
name: donatas-pc, domain:
Remote:
name: spine1-debian-11, domain:
software-version: advertised and received
Local:
GoBGP/3.10.0
Remote:
FRRouting/8.5-dev-MyOwnFRRVersion-gdc92f44a45-dirt
cisco-route-refresh: received
Message statistics:
```
FRR side:
```
root@spine1-debian-11:~# vtysh -c 'show bgp neighbor 192.168.10.17 json' | \
> jq '."192.168.10.17".neighborCapabilities.softwareVersion.receivedSoftwareVersion'
"GoBGP/3.10.0"
root@spine1-debian-11:~#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
The route-distinguisher string can be expressed in different
ways when the AS number is part of the RD. And the configured
string value has to be kept intact.
The following vty commands store the string value internally:
- router bgp / address-family ipv4 unicast / rd vpn export <>
- router bgp / address-family l2vpn evpn / rd <>
- router bgp / address-family l2vpn evpn / vni <> / rd <>
The vty commands where RD is configured in the below places is
not considered:
- router bgp / rfapi related commands
- router bgp / address-family xxx xxx / network .. rd <>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
RD may be built based on an AS number. Like for the AS, the RD
may use the AS notation. The two below examples can illustrate:
RD 1.1:20 stands for an AS4B:NN RD with AS4B=65536 in dot format.
RD 0.1:20 stands for an AS2B:NNNN RD with AS2B=0.1 in dot+ format.
This commit adds the asnotation mode to prefix_rd2str() API so as
to pick up the relevant display.
Two new printfrr extensions are available to display the RD with
the two above display methods.
- The pRDD extension stands for dot asnotation format
- The pRDE extension stands for dot+ asnotation format.
- The pRD extension has been renamed to pRDP extension
The code is changed each time '%pRD' printf extension is called.
Possibly, the asnotation may change the output, then a macro defines
the asnotation mode to use. A side effect of forging the mode to
use is that the string could not be concatenated with other strings
in vty_out and snprintfrr. Those functions have been called multiple
times. When zlog_debug needs to display the RD with some other string,
the prefix_rd2str() old API is used instead of the printf extension.
Some code has been kept untouched:
- code related to running-config. Actually, wherever an RD is displayed,
its configured name should be dumped.
- bgp rfapi code
- bgp evpn multihoming code (partially done), since the logic is
missing to get the asnotation of 'struct bgp_evpn_es'.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The as-notation mode of the BGP instance will impact the way
the neighbor AS information is dumped in the show commands.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The confederation peers as and the confederation identifier as
are stored as a string to preserve the output in the running
configuration.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This identifier is used to display the peer configuration in
the running-config, like it has been configured.
The following commands are using a specific string attribute:
- neighbor .. remote-as ASN
- neighbor .. local-as ASN
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
A json AS number API is created in order to output a
given AS number. In order to keep backward compatibility,
if the as-notation uses a number, then the json is encoded
as an integer, otherwise the encoding will be a string.
For what is not relevant to running-configuration, the
as-notation mode is the one used for the BGP instance.
Also, the vty completion gets the configured 'as_pretty'
string value, when an user wants to get the available
BGP instances.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
A new keyword permits changing the BGP as-notation output:
- [no] router bgp <> [vrf BLABLA] [as-notation [<dot|plain|dot+>]]
At the BGP instance creation, the output will inherit the way the
BGP instance is declared. For instance, the 'router bgp 1.1'
command will configure the output in the dot format. However, if
the client wants to choose an alternate output, he will have to
add the extra command: 'router bgp 1.1 as-notation dot+'.
Also, if the user wants to have plain format, even if the BGP
instance is declared in dot format, the keyword can also be used
for that.
The as-notation output is only taken into account at the BGP
instance creation. In the case where VPN instances are used,
a separate instance may be dynamically created. In that case,
the real as-notation format will be taken into acccount at the
first configuration.
Linking the as-notation format with the BGP instance makes sense,
as the operators want to keep consistency of what they configure.
One technical reason why to link the as-notation output with the
BGP instance creation is that the as-path segment lists stored
in the BGP updates use a string representation to handle aspath
operations (by using regexp for instance). Changing on the fly
the output needs to regenerate this string representation to the
correct format. Linking the configuration to the BGP instance
creation avoids refreshing the BGP updates. A similar mechanism
is put in place in junos too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
AS number can be defined as an unsigned long number, or
two uint16 values separated by a period (.). The possible
valus are:
- usual 32 bit values : [1;2^32 -1]
- <1.65535>.<0.65535> for dot notation
- <0.65535>.<0.65535> for dot+ notation.
The 0.0 value is forbidden when configuring BGP instances
or peer configurations.
A new ASN type is added for parsing in the vty.
The following commands use that new identifier:
- router bgp ..
- bgp confederation ..
- neighbor <> remote-as <>
- neighbor <> local-as <>
- clear ip bgp <>
- route-map / set as-path <>
An asn library is available in lib/ and provides some
services:
- convert an as string into an as number.
- parse an as path list string and extract a number.
- convert an as number into a string.
Also, the bgp tests forge an as_zero_path, and to do that,
an API to relax the possibility to have a 0 as value is
specifically called from the tests.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This is a preliminary work to handle various ways to configure
a BGP Autonomous System. When creating a BGP instance, the
user may want to define the AS number as a dotted value,
instead of using an integer value.
To handle both cases, an as_pretty char attribute will store
the as number as it has been given to the vtysh command:
router bgp <as number>
Whenever the as integer of the BGP instance was dumped,
the as_pretty original format is used.
The json output reuses the integer value to keep backward
compatibility with old displays.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The input queue limit does not belong under router bgp. This
is a dev escape and should just be removed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Consider this scenario:
Lots of peers with a bunch of route information that is changing
fast. One of the peers happens to be really slow for whatever
reason. The way the output queue is filled is that bgpd puts
64 packets at a time and then reschedules itself to send more
in the future. Now suppose that peer has hit it's input Queue
limit and is slow. As such bgp will continue to add data to
the output Queue, irrelevant if the other side is receiving
this data.
Let's limit the Output Queue to the same limit as the Input
Queue. This should prevent bgp eating up large amounts of
memory as stream data when under severe network trauma.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The idea is to drop unwanted attributes from the BGP UPDATE messages and
continue by just ignoring them. This improves the security, flexiblity, etc.
This is the command that Cisco has also.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
For now, if the order was mixed, most of the commands are just silently
ignored. Let the operator notice that.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Some keys are only present in the JSON data of BGP neighbors are only present if the peer is, or has previously been established.
While they are not present if the peer has never come up.
To keep the data structure aligned, the below keys are added also to the neighbors that BGP adjacency has never been established.
Values of the keys are all set to Unknown
hostname:Unknown,
nexthop:Unknown,
nexthopGlobal:Unknown,
nexthopLocal:Unknown,
bgpConnection:Unknown,
Signed-off-by: Karl Quan <kquan@nvidia.com>
When actually creating a peer in BGP, tell the creation if
it is a config node or not. There were cases where the
CONFIG_NODE was being set *after* being placed into
the bgp->peerhash, thus causing collisions between the
doppelganger and the peer and eventually use after free's.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Use %pI4/%pI6 where possible, otherwise at least atjust stack buffer sizes
for inet_ntop() calls.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
currently the following configuration
dut:
!
interface ntfp2
ip router isis 1
!
router bgp 200
no bgp ebgp-requires-policy
bgp confederation identifier 300
bgp confederation peers 300
neighbor 192.168.1.1 remote-as 100
neighbor 192.168.2.2 remote-as 300
!
address-family ipv4 unicast
neighbor 192.168.2.2 default-originate
exit-address-family
!
router isis 1
is-type level-2-only
net 49.0001.0002.0002.0002.00
redistribute ipv4 connected level-2
!
end
router:
!
interface ntfp2
ip router isis 1
isis circuit-type level-2-only
!
router bgp 300
no bgp ebgp-requires-policy
bgp confederation identifier 300
bgp confederation peers 200
neighbor 192.168.2.1 remote-as 200
neighbor 192.168.3.2 remote-as 400
!
address-family ipv4 unicast
network 3.3.3.0/24
exit-address-family
!
router isis 1
is-type level-2-only
net 49.0001.0003.0003.0003.00
redistribute ipv4 connected level-2
!
end
on dut result of show bgp ipv4 unicast command is:
show bgp ipv4 unicast
BGP table version is 1, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 200
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 192.168.1.1 0 0 100 i
instead of
sho bgp ipv4 unicast
BGP table version is 3, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 200
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 192.168.1.1 0 0 100 i
*> 3.3.3.0/24 192.168.2.2 0 100 0 (300) i
*> 4.4.4.0/24 192.168.3.2 0 100 0 (300) 400 i
Displayed 3 routes and 3 total paths
According to RFC 5065:the usage of one of the member AS number as the
confederation identifier is not forbidden.
fixes are the following
in bgp_route.c:
in bgp_update remove the test for presence of confederation id in
as_path since, this case is allowed;
in bgp_vty.c
bgp_confederation_peers, remove the test on peer as value
in bgpd.c
bgp_confederation_peers_add
remove the test on peer as value
invert the order of setting peer->sort value and peer->local_as,
since peer->sort is depending from current peer->local_as value
bgp_confederation_peers_remove
invert the order of setting peer->sort value and peer->local_as,
since peer->sort is depending from current peer->local_as value
Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
Previously BGP supported up to 255 SIDs.
The PR https://github.com/FRRouting/frr/pull/11981 extended the
transposition computation algorithm in BGP to support more SIDs (up to
1048575 SIDs).
However the BGP VTY command for allocating an SRv6 per-VRF SID
(`sid vpn per-vrf export`) is still limited to 255 SIDs.
This commit extends the SID index in `sid vpn per-vrf export` VTY
command to support up to 1048575 SIDs.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
We already have a global knob for graceful-shutdown, but it's handy having
per neighbor knob as well.
Especially when a single neighbor needs to be restarted/shutdown gracefuly.
We can do this route-maps, but this is a faster/cleaner way doing the same
for an operator.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Simulated latency with:
```
tc qdisc add dev eth3 root netem delay 100ms
```
```
donatas-laptop# sh ip bgp summary failed
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.0.2.252, local AS number 65000 vrf-id 0
BGP table version 28
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
192.168.10.65 2 2 00:00:17 Admin. shutdown (RTT)
Displayed neighbors 1
Total number of neighbors 1
donatas-laptop#
```
Another end received:
```
%NOTIFICATION: received from neighbor 192.168.10.17 6/2 (Cease/Administrative Shutdown) "shutdown due to high round-trip-time (104ms > 5ms, hit 21 times)"
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
`srv6_locator_chunk_free()` is a wrapper around the `XFREE()` macro.
Passing a NULL pointer to `XFREE()` is safe. Therefore, checking that
the pointer passed to the `srv6_locator_chunk_free()` is not null is
unnecessary.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
`srv6_locator_chunk_free()` takes care of freeing the memory allocated
for a `struct srv6_locator_chunk` and setting the
`struct srv6_locator_chunk` pointer to NULL.
It is not necessary to explicitly set the pointer to NULL after invoking
`srv6_locator_chunk_free()`.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
A programmer can use the `srv6_locator_chunk_free()` function to free
the memory allocated for a `struct srv6_locator_chunk`.
The programmer invokes `srv6_locator_chunk_free()` by passing a single
pointer to the `struct srv6_locator_chunk` to be freed.
`srv6_locator_chunk_free()` uses `XFREE()` to free the memory.
It is the responsibility of the programmer to set the
`struct srv6_locator_chunk` pointer to NULL after freeing memory with
`srv6_locator_chunk_free()`.
This commit modifies the `srv6_locator_chunk_free()` function to take a
double pointer instead of a single pointer. In this way, setting the
`struct srv6_locator_chunk` pointer to NULL is no longer the
programmer's responsibility but is the responsibility of
`srv6_locator_chunk_free()`. This prevents programmers from making
mistakes such as forgetting to set the pointer to NULL after invoking
`srv6_locator_chunk_free()`.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
Rather than running selected source files through the preprocessor and a
bunch of perl regex'ing to get the list of all DEFUNs, use the data
collected in frr.xref.
This not only eliminates issues we've been having with preprocessor
failures due to nonexistent header files, but is also much faster.
Where extract.pl would take 5s, this now finishes in 0.2s. And since
this is a non-parallelizable build step towards the end of the build
(dependent on a lot of other things being done already), the speedup is
actually noticeable.
Also files containing CLI no longer need to be listed in `vtysh_scan`
since the .xref data covers everything. `#ifndef VTYSH_EXTRACT_PL`
checks are equally obsolete.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Add a default limit to the InQ for messages off the bgp peer
socket. Make the limit configurable via cli.
Adding in this limit causes the messages to be retained in the tcp
socket and allow for tcp back pressure and congestion control to kick
in.
Before this change, we allow the InQ to grow indefinitely just taking
messages off the socket and adding them to the fifo queue, never letting
the kernel know we need to slow down. We were seeing under high loads of
messages and large perf-heavy routemaps (regex matching) this queue
would cause a memory spike and BGP would get OOM killed. Modifying this
leaves the messages in the socket and distributes that load where it
should be in the socket buffers on both send/recv while we handle the
mesages.
Also, changes were made to allow the ringbuffer to hold messages and
continue to be filled by the IO pthread while we wait for the Main
pthread to handle the work on the InQ.
Memory spike seen with large numbers of routes flapping and route-maps
with dozens of regex matching:
```
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: > 2GB
Holding block headers: 516 KiB
Used small blocks: 0 bytes
Used ordinary blocks: 160 MiB
Free small blocks: 3680 bytes
Free ordinary blocks: > 2GB
Ordinary blocks: 121244
Small blocks: 83
Holding blocks: 1
```
With most of it being held by the inQ (seen from the stream datastructure info here):
```
Type : Current# Size Total Max# MaxBytes
...
...
Stream : 115543 variable 26963208 15970740 3571708768
```
With this change that memory is capped and load is left in the sockets:
RECV Side:
```
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 265350 0 [fe80::4080:30ff:feb0:cee3]%veth1:36950 [fe80::4c14:9cff:fe1d:5bfd]:179 users:(("bgpd",pid=1393334,fd=26))
skmem:(r403688,rb425984,t0,tb425984,f1816,w0,o0,bl0,d61)
```
SEND Side:
```
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 1275012 [fe80::4c14:9cff:fe1d:5bfd]%veth1:179 [fe80::4080:30ff:feb0:cee3]:36950 users:(("bgpd",pid=1393443,fd=27))
skmem:(r0,rb131072,t0,tb1453568,f1916,w1300612,o0,bl0,d0)
```
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Ensure that un-configuring allowas-in for a peer or group
clears the related flags and integer value. Tighten the use
of the integer counter so that it's only used when the config
flag is set. Add show output if allowas-in is enabled.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
The command `sid vpn per-vrf export (1-255)|auto` can be used to export
IPv4 and IPv6 routes from a VRF to the VPN RIB using a single SRv6 SID
(End.DT46 behavior).
This commit implements the no form of the above command, which can be
used to disable the export of the IPv4/IPv6 routes:
`no sid vpn per-vrf export`.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
This commit adds the per-VRF SID chosen to advertise L3VPN for IPv4 and IPv6 address families using a single SID to the bgpd configuration.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
In the current implementation of bgpd, SRv6 SIDs can be configured only
under the address-family. This enables bgpd to leak IPv6 routes using
an SRv6 End.DT6 behavior and IPv4 routes using an SRv6 End.DT4
behavior. It is not possible to leak both IPv6 and IPv4 routes using a
single SRv6 SID.
This commit adds a new CLI command
"sid vpn per-vrf export <sid_idx|auto>" that enables bgpd to leak both
IPv6 and IPv4 routes using a single SRv6 SID (End.DT46 behavior).
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
When an SRv6 locator is unset, all the SRv6 SIDs allocated from the
locator are removed. Before freeing the memory allocated for an SRv6
SID, we check if the pointer to the SID is `NULL`.
However, checking for `NULL` before freeing memory is useless.
This PR aims to improve the code's readability by removing the
useless `NULL` checks.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
In order to send correct SRv6 L3VPN advertisement, we need to save
srv6_locator_chunk in vpn_policy. With this information, we can
construct correct SRv6 L3VPN advertisement packets.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
As an example, Arista EOS allows this behavior.
Configuration something like:
```
neighbor PG peer-group
neighbor PG remote-as 65001
neighbor PG local-as 65001
neighbor 192.168.10.124 peer-group PG
```
Or without peer-group.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
RFC4364 describes peerings between multiple AS domains, to ease
the continuity of VPN services across multiple SPs. This commit
implements a sub-set of IETF option b) described in chapter 10 b.
The ASBR to ASBR approach is taken, with an EBGP peering between
the two routers. The EBGP peering must be directly connected to
the outgoing interface used. In those conditions, the next hop
is directly connected, and there is no need to have a transport
label to convey the VPN label. A new vty command is added on a
per interface basis:
This command if enabled, will permit to convey BGP VPN labels
without any transport labels (i.e. with implicit-null label).
restriction:
this command is used only for EBGP directly connected peerings.
Other use cases are not covered.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
TCP keepalive is enabled once BGP connection is established.
New vty commands:
bgp tcp-keepalive <1-65535> <1-65535> <1-30>
no bgp tcp-keepalive
Signed-off-by: Xiaofeng Liu <xiaofeng.liu@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>