Currently bgp multipath has these properties:
a) mp_info may or may not be on a single path, based
upon path perturbations in the past.
b) mp_info->count started counting at 0( meaning 1 ). As that the
bestpath path_info was never included in the count
c) The first mp_info in the list held the multipath data associated
with the multipath. As such if you were at any other node that data
was not filled in.
d) As such the mp_info's that are not first on the list basically
were just pointers to the corresponding bgp_path_info that was in
the multipath.
e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was
created.
f) This linklist was passed in to a comparison function that took the
old mpinfo list and compared it item by item to the linklist and
doing magic to figure out how to create a new mp_info list.
g) the old mp_info and the link list had to be memory managed and
freed up.
h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the
multipath.
This is really complicated. Let's change the algorithm to this:
a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as
BGP_PATH_MULTIPATH_NEW.
b) When running multipath, just walk the list of bgp_path_info's and if
it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH.
If we run out of space to put in the ecmp, clear the flag on the rest.
c) Clean up the counting of sometimes adding 1 to the mpath count.
d) Only allocate a mpath_info node for the bestpath. Clean it up
when done with it.
e) remove the unneeded list management associated with the linklist and
the mp_list.
This greatly simplifies multipath computation for bgp and reduces memory
load for large scale deployments.
2 full feeds in work_queue_run prior:
0 56367.471 1123 50193 493695 50362 493791 0 0 0 TE work_queue_run
BGP multipath info : 1941844 48 110780992 1941844 110780992
2 full feeds in work_queue_run after change:
1 52924.931 1296 40837 465968 41025 487390 0 0 1 TE work_queue_run
BGP multipath info : 970860 32 38836880 970866 38837120
Aproximately 4 seconds of saved cpu time for convergence and ~75 mb
smaller run time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is handy when you need to do source matching e.g. `match src-peer ...`
on outgoing direction with a route-map.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
There is a need to be able to process certain bgp
routes earlier than others. Especially when there
is major trauma going on in the network. Start
the ability for this to happen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
evpn has a concept of `local` tables where the evpn routes
are actually converted into underlying routes/neighbor
table entries( or vice versa ). Then this local route
is propagated to the global evpn l2vpn table and sent
to the peers. Certain show commands in evpn look
operate on the local table but make the output look
like the data has not been sent to the peer. This
is confusing for the operator. Modify the code
such that local tables get a `Local BGP table not advertised`
in the place where the code talks about whom has received
the data or not.
Example:
torm11# show bgp l2vpn evpn route vni 1000 mac 8a:a1:cc:73:a3:ac ip 45.0.0.5
BGP routing table entry for [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5]
Paths: (2 available, best #2)
Local BGP table not advertised
Route [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5] VNI 1000
Imported from 192.168.100.18:2:[2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5], VNI 1000
65101 65005
192.168.100.18(leaf2) from leaf2(192.168.5.1) (192.168.100.14)
Origin IGP, valid, external
Extended Community: RT:65005:1000 ET:8
Last update: Thu Mar 21 14:29:04 2024
Route [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5] VNI 1000
Imported from 192.168.100.18:2:[2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5], VNI 1000
65101 65005
192.168.100.18(leaf1) from leaf1(192.168.1.1) (192.168.100.13)
Origin IGP, valid, external, bestpath-from-AS 65101, best (Router ID)
Extended Community: RT:65005:1000 ET:8
Last update: Thu Mar 21 14:29:04 2024
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix static-analyser warnings with BGP labels:
> $ scan-build make -j12
> bgpd/bgp_updgrp_packet.c:819:10: warning: Access to field 'extra' results in a dereference of a null pointer (loaded from variable 'path') [core.NullDereference]
> ? &path->extra->labels->label[0]
> ^~~~~~~~~
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add bgp_path_info_labels_same() to compare labels with labels from
path_info. Remove labels_same() that was used for mplsvpn only.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
The usage of the `bgp bestpath med missing-as-worst` command
was being accepted and applied during bestpath, but during output
of the routes affected by this it would not give any indication
that this was happening or what med value was being used.
Fixes: #15718
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently bgp_path_info's are stored in reverse order
received. Sort them by the best path ordering.
This will allow for optimizations in the future on
how multipath is done.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This will allow a consistency of approach to adding/removing
pi's to from the workqueue for processing as well as properly
handling the dest->info pi list more appropriately.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a new flag BGP_PATH_UNSORTED to keep track
of sorted -vs- unsorted path_info's. Add some
ability to the system to understand when that
flag is set.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running `show ip bgp` command, the 'route short status' and
'network' columns do not have white-space between them.
Old show:
Network Next Hop Metric LocPrf Weight Path
*>i1.1.1.1/32 10.1.12.111 0 100 0 i
New show:
Network Next Hop Metric LocPrf Weight Path
*>i 1.1.1.1/32 10.1.12.111 0 100 0 i
Added white-space to enhance readability between them.
Signed-off-by: Cassiano Campes <cassiano.campes@venkonetworks.com>
Structure size of bgp_path_info_extra when compiled
with vnc is 184 bytes. Reduce this size to 72 bytes
when compiled w/ vnc but not necessarily turned
on vnc.
With 2 full bgp feeds this saves aproximately 100mb
when compiling with vnc and not using vnc.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Without this change when we change the route-map, we never reinstall the route
if the route-map has changed.
We checked only some attributes like aspath, communities, large-communities,
extended-communities, but ignoring the rest of attributes.
With this change, let's check if the route-map has changed.
bgp_route_map_process_update() is triggered on route-map change, and we set
`changed` to true, which treats aggregated route as not the same as it was before.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
moved loc-rib uptime field "bgp_rib_uptime" to struct bgp_path_info_extra for memory concerns
moved logic into bgp_route_update's callback bmp_route_update
written timestamp in per peer header
Signed-off-by: Maxence Younsi <mx.yns@outlook.fr>
added time_t field to bgp_path_info
set value before bgp dp hook is called
value not set in the msg yet, testing and double checking is needed before
Signed-off-by: Maxence Younsi <mx.yns@outlook.fr>
set peer type flag to 3 for loc rib monitoring
leave to 0 in other cases like before, even though RFC7854 tells us to set it to 0 1 or 2 depending on the case global/rd/local instance
Signed-off-by: Maxence Younsi <mx.yns@outlook.fr>
But never really does due to locking, but since it can
we need to treat it like it does and ensure that FRR
is not making a mistake, by using memory after it
has been freed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is based on @donaldsharp's work
The current code base is the struct bgp_node data structure.
The problem with this is that it creates a bunch of
extra data per route_node.
The table structure generates ‘holder’ nodes
that are never going to receive bgp routes,
and now the memory of those nodes is allocated
as if they are a full bgp_node.
After splitting up the bgp_node into bgp_dest and route_node,
the memory of ‘holder’ node which does not have any bgp data
will be allocated as the route_node, not the bgp_node,
and the memory usage is reduced.
The memory usage of BGP node will be reduced from 200B to 96B.
The total memory usage optimization of this part is ~16.00%.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>
Even if some of the attributes in bgp_path_info_extra are
not used, their memory is still allocated every time. It
cause a waste of memory.
This commit code deletes all unnecessary attributes and
changes the optional attributes to pointer storage. Memory
will only be allocated when they are actually used. After
optimization, extra info related memory is reduced by about
half(~400B -> ~200B).
Signed-off-by: Valerian_He <1826906282@qq.com>
The advertised label value from mpls vpn routes is not modified
when the advertised next-hop is modified to next-hop-self.
Actually, the original label value received is redistributed as
is, whereas the new_label value bound in the nexthop label
bind entry should be used.
Only the VPN entries that contain MPLS information, and that
are redistributed between distinct peers, will have a label
value to advertise.
- no SRv6 attribute
- no local prefix
- no exported VPN prefixes from a VRF
If the advertisement to a given peer has the next-hop modified,
then the new label value will be picked up. The considered cases
are peers configured with 'next-hop-self' option, or ebgp peerings
without the 'next-hop-unchanged' option.
Note that the the NLRI format will follow the rfc3107 format, as
multiple label values for MPLS VPN NLRIs are not supported (the
rfc8277 is not supported).
Note also that the case where an outgoing route-map is applied to
the outgoing neighbor is not considered in this commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Current implementation does not offer a new label to bind
to a received VPN route entry to redistribute with that new
label.
This commit allocates a label for VPN entries that have
a valid label, and a reachable next-hop interface that is
configured as follows:
> interface eth0
> mpls bgp l3vpn-multi-domain-switching
> exit
An mplsvpn next-hop label binding entry is created in an mpls
vpn nexthop label bind hash table of the current BGP instance.
That mpls vpn next-hop label entry is indexed by the (next-hop,
orig_label) values provided by the incoming updates, and shared
with other updates having the same (next-hop, orig_label) values.
A new 'LP_TYPE_BGP_L3VPN_BIND' label value is picked up from the
zebra mpls label pool, and assigned to the new_label attribute.
The 'bgp_path_info' appends a 'bgp_mplsvpn_nh_label_bind' structure
to the 'mplsvpn' union structure. Both structures in the union are not
used at the same, as the paths are either VRF updates to export, or MPLS
VPN updates. Using an union gives a 24 bytes memory gain compared to if
the structures had not been in an union (24 bytes compared to 48 bytes).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The label per nexthop attributes take 24 bytes per bgp path
entry on AMD64 platform, and are only used for unicast paths.
The current patch-set introduces a similar attributes, but that
will be used only for l3vpn paths. To gain some memory on the
bgp_path_info structure in the next commit, do some changes.
Create an 'mplsvpn' union structure that will either include the
label per nexthop structs for ipv4 paths, or the l3vpn paths
structures. The 'label_nexthop_cache' and the 'label_nh_thread'
attributes of the 'bgp_path_info' structure are moved into an
union under a new structure called 'bgp_mplsvpn_label_nh_blnc'.
The flags attribute of 'bgp_path_info' is increased from 16 bits
to 32 bits, and the BGP_PATH_MPLSVPN_LABEL_NH flag is added to
know the 'mplsvpn' usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
BGP cannot decide to disseminate the safi based upon the
bgp suppress-fib command. Modify the code to look at
the safi for the decision to communicate to a peer the
particular node.
Ticket: #3402926
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When using `addpath-tx-all` BGP announces all known paths instead of announcing
only an arbitrary number of best paths.
With this new command we can send N best paths to the neighbor. That means, we
send the best path, then send the second best path excluding the previous one,
and so on. In other words, we run best path selection algorithm N times before
we finish.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Reuse subgroup_process_announce_selected(). It does the same as we do here
duplicating the logic.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Memory leaks are observed in the cleanup code. When “no router bgp" is executed,
cleanup in that flow for aggregate-address command is not taken care.
fixes the below leak:
--
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444:Direct leak of 152 byte(s) in 1 object(s) allocated from:
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #0 0x7f163e911037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #1 0x7f163e4b9259 in qcalloc lib/memory.c:105
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #2 0x562bf42ebbd5 in bgp_aggregate_new bgpd/bgp_route.c:7239
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #3 0x562bf42f14e8 in bgp_aggregate_set bgpd/bgp_route.c:8421
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #4 0x562bf42f1e55 in aggregate_addressv6_magic bgpd/bgp_route.c:8592
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #5 0x562bf42be3f5 in aggregate_addressv6 bgpd/bgp_route_clippy.c:341
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #6 0x7f163e3f1e1b in cmd_execute_command_real lib/command.c:988
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #7 0x7f163e3f219c in cmd_execute_command lib/command.c:1048
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #8 0x7f163e3f2df4 in cmd_execute lib/command.c:1215
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #9 0x7f163e5a2d73 in vty_command lib/vty.c:544
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #10 0x7f163e5a79c8 in vty_execute lib/vty.c:1307
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #11 0x7f163e5ad299 in vtysh_read lib/vty.c:2216
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #12 0x7f163e593f16 in event_call lib/event.c:1995
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #13 0x7f163e47c839 in frr_run lib/libfrr.c:1185
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #14 0x562bf414e58d in main bgpd/bgp_main.c:505
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #15 0x7f163de66d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444-
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444:Direct leak of 152 byte(s) in 1 object(s) allocated from:
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #0 0x7f163e911037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #1 0x7f163e4b9259 in qcalloc lib/memory.c:105
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #2 0x562bf42ebbd5 in bgp_aggregate_new bgpd/bgp_route.c:7239
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #3 0x562bf42f14e8 in bgp_aggregate_set bgpd/bgp_route.c:8421
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #4 0x562bf42f1cde in aggregate_addressv4_magic bgpd/bgp_route.c:8543
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #5 0x562bf42bd258 in aggregate_addressv4 bgpd/bgp_route_clippy.c:255
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #6 0x7f163e3f1e1b in cmd_execute_command_real lib/command.c:988
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #7 0x7f163e3f219c in cmd_execute_command lib/command.c:1048
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #8 0x7f163e3f2df4 in cmd_execute lib/command.c:1215
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #9 0x7f163e5a2d73 in vty_command lib/vty.c:544
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #10 0x7f163e5a79c8 in vty_execute lib/vty.c:1307
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #11 0x7f163e5ad299 in vtysh_read lib/vty.c:2216
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #12 0x7f163e593f16 in event_call lib/event.c:1995
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #13 0x7f163e47c839 in frr_run lib/libfrr.c:1185
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #14 0x562bf414e58d in main bgpd/bgp_main.c:505
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #15 0x7f163de66d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444-
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444-SUMMARY: AddressSanitizer: 304 byte(s) leaked in 2 allocation(s).
Signed-off-by: Samanvitha B Bhargav <bsamanvitha@vmware.com>