Append zebra and lib to use muliple SRv6 segs SIDs, and keep one
seg SID for bgpd and sharpd.
Note: bgpd and sharpd compilation relies on the lib and zebra files,
i.e if we separate this: lib or zebra or bgpd or sharpd in different
commits - this will not compile.
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Link-state prefixes are only intended to be read for a link-state
consumer (i.e. a controler). They cannot be installed in Forwarding
Information Base (FIB).
Do not announce them to zebra.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Modify bgp_fsm_change_status to be connection oriented and
also make the BGP_TIMER_ON and BGP_EVENT_ADD macros connection
oriented as well. Attempt to make peer_xfer_conn a bit more
understandable because, frankly it was/is confusing.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When the BGP 'redistribute table' command is used for a given route
table, and BGP configuration is flushed and rebuilt, the redistribution
does not work.
Actually, when flushing the BGP configuration with the 'no router bgp'
command, the BGP redistribute entries related to the 'redistribute table'
entries are not flushed. Actually, at BGP deletion, the table number is
not given as parameter in bgp_redistribute_unset() function, and the
redistribution entry is not removed in zebra.
Fix this by adding some code to flush all the redistribute table
instances.
Fixes: 7c8ff89e93 ("Multi-Instance OSPF Summary")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This is based on @donaldsharp's work
The current code base is the struct bgp_node data structure.
The problem with this is that it creates a bunch of
extra data per route_node.
The table structure generates ‘holder’ nodes
that are never going to receive bgp routes,
and now the memory of those nodes is allocated
as if they are a full bgp_node.
After splitting up the bgp_node into bgp_dest and route_node,
the memory of ‘holder’ node which does not have any bgp data
will be allocated as the route_node, not the bgp_node,
and the memory usage is reduced.
The memory usage of BGP node will be reduced from 200B to 96B.
The total memory usage optimization of this part is ~16.00%.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>
In the netlink-mediated kernel dataplane, each rule is stored
in either an IPv4-specific database or an IPv6-specific database.
PBRD opportunistically gleans each rule's address family value
from its source or destination IP address match value (if either
exists), or from its nexthop or nexthop-group (if it exists).
The 'family' value is particularly needed for netlink during
incremental rule deletion when none of the above fields remain set.
Before now, this address family has been encoded by occult means
in the (possibly otherwise unset) source/destination IP match
fields in ZAPI and zebra.
This commit documents the reasons for maintaining the 'family'
field in the PBRD rule structure, adds a 'family' field in the
common lib/pbr.h rule structure, and carries it explicitly in ZAPI.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Even if some of the attributes in bgp_path_info_extra are
not used, their memory is still allocated every time. It
cause a waste of memory.
This commit code deletes all unnecessary attributes and
changes the optional attributes to pointer storage. Memory
will only be allocated when they are actually used. After
optimization, extra info related memory is reduced by about
half(~400B -> ~200B).
Signed-off-by: Valerian_He <1826906282@qq.com>
Over time the bgp_zebra_announce_default function has gotten
slightly convoluted, clean it up so it's easier to read
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
bgpd, pbrd: use common pbr encoder
zebra: use common pbr decoder
tests: pbr_topo1: check more filter fields
Purpose:
1. Reduce likelihood of zapi format mismatches when adding
PBR fields due to multiple parallel encoder implementations
2. Encourage common PBR structure usage among various daemons
3. Reduce coding errors via explicit per-field enable flags
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Subset: ZAPI changes to send the new data
Also adds filter_bm field; currently for PBR_FILTER_PCP, but in the
future to be used for all of the filter fields.
Changes by:
Josh Werner <joshuawerner@mitre.org>
Eli Baum <ebaum@mitre.org>
G. Paul Ziemba <paulz@labn.net>
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
bgp_zebra_tm_connect calls bgp_zebra_get_table_range which
just used the global zclient. Which of course still had
us exposing the global zclient to read and drop important
data from zebra. This fixes commit 787c61e03c
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If this an error, we should use zlog_err, not zlog_info as this is literally
not an information, but an error.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When running all daemons with config for most of them, FRR has
sharpd@janelle:~/frr$ vtysh -c "show debug hashtable" | grep "VRF BIT HASH" | wc -l
3570
3570 hashes for bitmaps associated with the vrf. This is a very
large number of hashes. Let's do two things:
a) Reduce the created size of the actually created hashes to 2
instead of 32.
b) Delay generation of the hash *until* a set operation happens.
As that no hash directly implies a unset value if/when checked.
This reduces the number of hashes to 61 in my setup for normal
operation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Both the label manager and table manager zapi code send data requests via zapi
to zebra and then immediately listen for a response from zebra. The problem here
is of course that the listen part is throwing away any zapi command that is not
the one it is looking for.
ISIS/OSPF and PIM all have synchronous abilities via zapi, which they all
do through a special zapi connection to zebra. BGP needs to follow this model
as well. Additionally the new zclient_sync connection that should be created,
a once a second timer should wake up and read any data on the socket to
prevent problems too much data accumulating in the socket.
```
r3# sh bgp labelpool summary
Labelpool Summary
-----------------
Ledger: 3
InUse: 3
Requests: 0
LabelChunks: 1
Pending: 128
Reconnects: 1
r3# sh bgp labelpool inuse
Prefix Label
---------------------------
10.0.0.1/32 16
192.168.31.0/24 17
192.168.32.0/24 18
r3#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When advertising an mpls vpn entry with a new label,
the return traffic is redirected to the local machine,
but the MPLS traffic is dropped.
Add an MPLS entry to handle MPLS packets which have
the new label value. Traffic is swapped to the original
label value from the mpls vpn next-hop entry; then it is
sent to the resolved next-hop of the original next-hop
from the mpls vpn next-hop entry.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
For whatever reason, we were only updating tip_hash when we processed an
L2VNI add/del. This adds tip_hash updates to the L3VNI add/del codepaths
so that their VTEP-IPs are also used when when considering martian
addresses, e.g. bgp_nexthop_self().
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
BGP MPLSVPN next hop label allocation was using only the next-hop
IP address. As MPLSVPN contexts rely on bnc contexts, the real
nexthop interface is known, and the LSP entry to enter can apply
to the specific interface. To illustrate, the BGP service is able
to handle the following two iproute2 commands:
> ip -f mpls route add 105 via inet 192.0.2.45 dev r1-eth1
> ip -f mpls route add 105 via inet 192.0.2.46 dev r1-eth2
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The flag for telling BGP that a route is expected to be installed
first before notifying a peer was always being set upon receipt
of a path that could be accepted as bestpath. This is not correct:
imagine that you have a peer sending you a route and you have a
network statement that covers the same route. Irrelevant if the
network statement would win the flag on the dest was being set
in bgp_update. Thus you could get into a situation where
the network statement path wins but since the flag is set on
the node, it will never be announced to a peer.
Let's just move the setting of the flag into bgp_zebra_announce
and _withdraw. In _announce set the flag to TRUE when suppress-fib
is enabled. In _withdraw just always unset the flag as that a withdrawal
does not need to wait for rib removal before announcing. This will
cover the case when a network statement is added after the route has
been learned from a peer.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
BGP MPLSVPN next hop label allocation was using only the next-hop
IP address. As MPLSVPN contexts rely on bnc contexts, the real
nexthop interface is known, and the LSP entry to enter can apply
to the specific interface. To illustrate, the BGP service is able
to handle the following two iproute2 commands:
> ip -f mpls route add 105 via inet 192.0.2.45 dev r1-eth1
> ip -f mpls route add 105 via inet 192.0.2.46 dev r1-eth2
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There are some specific edge-cases when is a need to run FRR and another FRR
and/or another BGP implementation on the same box. Relaxing 127.0.0.0/8 for
this case might be reasonable.
An example below peering via 127.0.0.0/8 between FRR and GoBGP:
```
% ss -ntlp | grep 179
LISTEN 0 4096 127.0.0.1:179 0.0.0.0:*
LISTEN 0 128 127.0.0.2:179 0.0.0.0:*
% grep 127.0.0.2 /etc/frr/daemons
bgpd_options=" -A 127.0.0.1 -l 127.0.0.2"
% grep local /etc/gobgp/config.toml
local-address-list = ["127.0.0.1"]
donatas-pc# sh ip bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.10.17, local AS number 65001 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 725 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
127.0.0.1 4 65002 7 7 0 0 0 00:02:02 0 0 N/A
Total number of neighbors 1
donatas-pc#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
a) Make it legible what type of message is being passed
back and forth instead of having to guess it from
the insufficient debugs
b) Make it explicit which bgp instance is sending this
data
c) Cleanup bgp_zebra_update to have a cleaner api
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Static Analysis caught a bug where we could be reading
garbage values for labels/num_lables. Fix that by
ensuring it's set to NULL/0 per loop of the mpath.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add some bgp_path_info helper functions for getting the correct l3vni
label, getting the vni from the label stack, and determinging if
the mpath is D-VNI based.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add functionality to always send the L3VNI to zebra as a label
on the route. It will be zebra's job to determine how to use it (i.e.
via Single Vxlan Device or not).
The l3VNI according to rfc should always be the second for a type2 route
and be the only one available for a type5. Hence, we can just grab the
last label in the stack here and add it onto the route.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Take it into consideration for one debug log:
EVPN MAC-IP routes with a L3 NHG id, has no nexthops.
Not "delete", but "add".
Before:
```
Tx route delete VRF 21 192.168.30.253/32 metric 0 tag 0 count 0 nhg 72580649
```
After:
```
Tx route add VRF 21 192.168.30.253/32 metric 0 tag 0 count 0 nhg 72580649
```
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Current bgpd can't annouce SRv6 locally-generated routes to Zebra
correctly because MPLS label of locally-generated routes is not valid
but sid_info->transposition_len is set to non-zero value. This commit
fixes such kind of issues.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
When the last IPv4 address of an interface is deleted, Linux removes all
routes includes BGP ones using this interface without any Netlink
advertisement. bgpd keeps them in RIB as valid (e.g. installed in FIB).
The previous patch invalidates the associated nexthop groups in zebra
but bgpd is not notified of the event.
> 2022/05/09 17:37:52.925 ZEBRA: [TQKA8-0276P] Not Notifying Owner: connected about prefix 29.0.0.0/24(40) 3 vrf: 7
Look for the bgp_path_info that are unsynchronized with the kernel and
flag them for refresh in their attributes. A VPN route leaking update is
calles and the refresh flag triggers a route refresh to zebra and then a
kernel FIB installation.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Previous commits have introduced a new 8 bits nh_flag in the attr
struct that has increased the memory footprint.
Move the mp_nexthop_prefer_global boolean in the attr structure that
takes 8 bits to the new nh_flag in order to go back to the previous
memory utilization.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Leaked recursive routes are not resolved.
> VRF r1-cust1:
> B> 5.1.0.0/24 [200/98] via 99.0.0.1 (recursive), weight 1, 00:00:08
> * via 192.168.1.2, r1-eth4, weight 1, 00:00:08
> B>* 99.0.0.1/32 [200/0] via 192.168.1.2, r1-eth4, weight 1, 00:00:08
> VRF r1-cust4:
> B 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) inactive, weight 1, 00:00:08
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:08
When announcing the routes to zebra, use the peer of the ultimate bgp
path info instead of the one of the first parent path info to determine
whether the route is recursive.
The result is:
> VRF r1-cust4:
> B> 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) (recursive), weight 1, 00:00:02
> * via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
At bgpd startup, VRF instances are sent from zebra before the
interfaces. When importing a l3vpn prefix from another local VRF
instance, the interfaces are not known yet. The prefix nexthop interface
cannot be set to the loopback or the VRF interface, which causes setting
invalid routes in zebra.
Update route leaking when the loopback or a VRF interface is received
from zebra.
At a VRF interface deletion, zebra voluntarily sends a
ZEBRA_INTERFACE_ADD message to move it to VRF_DEFAULT. Do not update if
such a message is received. VRF destruction will destroy all the related
routes without adding codes.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Use %pI4/%pI6 where possible, otherwise at least atjust stack buffer sizes
for inet_ntop() calls.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
LL address is assigned, but we get a warning, that it's not:
Interface: enp3s0 does not have a v6 LL address associated with it, waiting until one is created for it
```
donatas-pc# sh int enp3s0
Interface enp3s0 is up, line protocol is up
Link ups: 0 last: (never)
Link downs: 0 last: (never)
vrf: default
index 2 metric 0 mtu 1500 speed 100
flags: <UP,BROADCAST,RUNNING,MULTICAST>
v4 Multicast forwarding is on
v6 Multicast forwarding is on
Type: Ethernet
HWaddr: 18:c0:4d:96:fa:3f
inet 192.168.10.17/24
inet6 2a02:4780:abc:0:e776:6220:1e21:44b1/64
inet6 fe80::ca5d:fd0d:cd8:1bb7/64
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
`srv6_locator_chunk_free()` takes care of freeing the memory allocated
for a `struct srv6_locator_chunk` and setting the
`struct srv6_locator_chunk` pointer to NULL.
It is not necessary to explicitly set the pointer to NULL after invoking
`srv6_locator_chunk_free()`.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
A programmer can use the `srv6_locator_chunk_free()` function to free
the memory allocated for a `struct srv6_locator_chunk`.
The programmer invokes `srv6_locator_chunk_free()` by passing a single
pointer to the `struct srv6_locator_chunk` to be freed.
`srv6_locator_chunk_free()` uses `XFREE()` to free the memory.
It is the responsibility of the programmer to set the
`struct srv6_locator_chunk` pointer to NULL after freeing memory with
`srv6_locator_chunk_free()`.
This commit modifies the `srv6_locator_chunk_free()` function to take a
double pointer instead of a single pointer. In this way, setting the
`struct srv6_locator_chunk` pointer to NULL is no longer the
programmer's responsibility but is the responsibility of
`srv6_locator_chunk_free()`. This prevents programmers from making
mistakes such as forgetting to set the pointer to NULL after invoking
`srv6_locator_chunk_free()`.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
In the current implementation of bgpd, SRv6 SIDs can be configured only
under the address-family. This enables bgpd to leak IPv6 routes using
an SRv6 End.DT6 behavior and IPv4 routes using an SRv6 End.DT4
behavior. It is not possible to leak both IPv6 and IPv4 routes using a
single SRv6 SID.
This commit adds a new CLI command
"sid vpn per-vrf export <sid_idx|auto>" that enables bgpd to leak both
IPv6 and IPv4 routes using a single SRv6 SID (End.DT46 behavior).
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
In order to send correct SRv6 L3VPN advertisement, we need to save
srv6_locator_chunk in vpn_policy. With this information, we can
construct correct SRv6 L3VPN advertisement packets.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
When primary global v6 unicast address is configured on an
unnumbered interface, BGP does not re-advertise updates out
with the new global v6 address as the nexthop
Signed-off-by: Pdoijode <pdoijode@nvidia.com>
RFC4364 describes peerings between multiple AS domains, to ease
the continuity of VPN services across multiple SPs. This commit
implements a sub-set of IETF option b) described in chapter 10 b.
The ASBR to ASBR approach is taken, with an EBGP peering between
the two routers. The EBGP peering must be directly connected to
the outgoing interface used. In those conditions, the next hop
is directly connected, and there is no need to have a transport
label to convey the VPN label. A new vty command is added on a
per interface basis:
This command if enabled, will permit to convey BGP VPN labels
without any transport labels (i.e. with implicit-null label).
restriction:
this command is used only for EBGP directly connected peerings.
Other use cases are not covered.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Running `bgp_srv6l3vpn_to_bgp_vrf` and `bgp_srv6l3vpn_to_bgp_vrf2`
topotests with `--valgrind-memleaks` gives several memory leak errors.
This is due to the way SRv6 locators are removed/unset in bgpd: when
an SRv6 locator is deleted or unset, the memory allocated for the
locator prefix (`tovpn_sid_locator`) is not freed.
This patch adds a `for` loop that iterates over the list of BGP
instances. For each BGP instance using the SRv6 locator to be
removed/unset, we use `XFREE()` to properly free the memory allocated
for `tovpn_sid_locator` after the SRv6 locator is removed or unset.
The memory allocated for `tovpn_sid_locator` cannot be freed before
calling `vpn_leak_postchange_all()`. This is because
after deleting an SRv6 locator, we call `vpn_leak_postchange_all()`
to handle the SRv6 locator deletion and send a BGP Prefix SID withdraw
message. `tovpn_sid_locator` is required to properly build the BGP
Prefix SID withdraw message. After calling `vpn_leak_postchange_all()`
we can safely remove the `tovpn_sid_locator` and free the allocated
memory.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
Running `bgp_srv6l3vpn_to_bgp_vrf` and `bgp_srv6l3vpn_to_bgp_vrf2`
topotests with `--valgrind-memleaks` gives several memory leak errors.
This is due to the way SRv6 SIDs are removed in bgpd: when
an SRv6 locator is deleted/unset, all the SIDs allocated from that
locator are removed from the SRv6 functions list
(`bgp->srv6_functions`),but the memory allocated for the SIDs is not
freed.
This patch adds a call to `XFREE()` to properly free the allocated
memory when an SRv6 SID is removed.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
Running `bgp_srv6l3vpn_to_bgp_vrf` and `bgp_srv6l3vpn_to_bgp_vrf2`
topotests with `--valgrind-memleaks` gives several memory leak errors.
This is due to the way SRv6 locators are deleted/unset in bgpd: when
an SRv6 locator is deleted/unset, all the chunks of the locator are
removed from the SRv6 locator chunks list (`bgp->srv6_locator_chunks`).
However, the memory allocated for the chunks is not freed.
This patch adds a call to the `srv6_locator_chunk_free()` function to
properly free the allocated memory when an SRv6 locator is removed or
unset.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
RFC 4760 states we SHOULD ignore the NEXT_HOP attribute for BGP Update
messages carrying only MP_REACH_NLRI attributes. Thus we should use the
Network Address of Next Hop field of the MP_REACH_NLRI as the nexthop.
Instead of always looking for BGP_ATTR_NEXT_HOP, this commit ensures:
1) we set mp_nexthop_len to BGP_ATTR_NHLEN_IPV4 for v4 bgp_static routes
2) we check mp_nexthop_len when choosing the nexthop to use for nht
3) we check mp_nexthop_len when choosing the nexthop to send to zebra
4) we check mp_nexthop_len when picking the nexthop to shown by vtysh
Reported-by: Binon Gorbutt <binon@aervivo.com>
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
The bgp_path_info_to_ipv6_nexthop will correctly set
the nexthop value. There is no need to test this to
display something that won't be used in debug
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In commit: d70a31a3ef
the Zapi ZEBRA_RULE_ADD message was modified but
the bgp version was not updated appropriately and
when zebra received the message it did not properly
read it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Multipath route may have mixed nexthops of EVPN and IP unicast. Move
EVPN flag to nexthop to support such cases.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Start using mpls_lse_encode/mpls_lse_decode, that is endian-aware, because
we always use host-byte order, should use network-byte.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Since additional information such as block_bits_length is needed to
generate SIDs properly, the type of elements in srv6_locator_chunks
list is extended from "struct prefix_ipv6 *" to
"struct srv6_locator_chunk *". Even in terms of variable name,
"struct srv6_locator_chunk *" is appropriate.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
When BGP detects that a peering is using a global address but no v6 LL
address has been created for the interface that the global address is
on warn the user that something is amiss and they need to fix it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This patch adds transpostion_offset and transposition_len to bgp_sid_info,
and transposes SID only at bgp_zebra_announce.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
Currently the Wait for Install code ( bgp_suppress_fib ) does
not properly handle two states from zebra: ROUTE_INSTALL_FAILED
and BETTER_ADMIN_DISTANCE_WON. Pre this change the WFI code
would just never notify our peers about a route install failure
but more is needed. In the ROUTE_INSTALL_FAILED and the
BETTER_ADMIN_DISTANCE_WON we need to notify our peers with
a withdrawal about the route, else we will continue to
draw traffic to us when we cannot legally do so.
Why is this needed? In either case imagine that we've already
received a bgp route, installed it and sent to our peers.
In the Better admin distance won case, say a static route is installed
at this point in time we must stop advertising the route through
us since we are not installed. As such a withdrawal must be sent.
In the ROUTE_INSTALL_FAILED case, the code was not properly handling
the situation where we have Route A, it was successfully installed
and then we received a update to Route A that was attempted to be
installed but failed. In this case we also need to send a withdrawal
Finally update the bgp_suppress_fib topotest to test both of these
situations.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>