Commit Graph

571 Commits

Author SHA1 Message Date
Donald Sharp
6de9f7fbf5 *: Move distance related defines into their own header
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-11-07 06:47:51 -05:00
David Lamparter
d889055d8e lib: convert if_zapi_callbacks into actual hooks
...so that multiple functions can be subscribed.

The create/destroy hooks are renamed to real/unreal because that's what
they *actually* signal.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2023-11-02 17:10:43 -07:00
Russ White
644386fe48
Merge pull request #14388 from pguibert6WIND/redistribute_table_bgp_2
Redistribute table bgp without copying data to the default routing table
2023-10-31 13:23:57 -04:00
Philippe Guibert
b6367f8460 bgpd: add redistribute table-direct support
Add the 'redistribute table-direct' command under the bgp address-family
node. Handle the table-direct support wherever needed in the BGP code.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-10-20 13:28:52 +02:00
Philippe Guibert
d162d5f6f5 bgpd: fix hardset l3vpn label available in mpls pool
Today, when configuring BGP L3VPN mpls, the operator may
use that command to hardset a label value:

> router bgp 65500 vrf vrf1
> address-family ipv4 unicast
> label vpn export <hardset_label_value>

Today, BGP uses this value without checks, leading to potential
conflicts with other control planes like LDP. For instance, if
LDP initiates with a label chunk of [16;72] and BGP also uses the
50 label value, a conflict arises.

The 'label manager' service in zebra oversees label allocations.
While all the control plane daemons use it, BGP doesn't when a
hardset label is in place.

This update fixes this problem. Now, when a hardset label is set for
l3vpn export, a request is made to the label manager for approval,
ensuring no conflicts with other daemons. But, this means some existing
BGP configurations might become non-operational if they conflict with
labels already allocated to another daemon but not used.

note: Labels below 16 are reserved and won't be checked for consistency
by the label manager.

Fixes: ddb5b4880b ("bgpd: vpn-vrf route leaking")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-10-18 09:41:02 +02:00
Donald Sharp
c18d7ddd78 bgpd: Remove unused cumulative bandwidth variable
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-10-12 13:35:39 -04:00
Donald Sharp
3e73271653 bgpd: Just pass down the Bandwidth unmodified so that Zebra can use it
Instead of scaling the bandwith to something between 1 and 100, just
send down the bandwidth Available for the link.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-10-12 13:35:39 -04:00
Donald Sharp
4ab7fa86b0 Revert "bgpd: do not announce link-state routes to zebra"
This reverts commit 39fb34275f.
2023-10-10 16:43:59 -04:00
anlan_cs
b580c52698 *: remove ZEBRA_INTERFACE_VRF_UPDATE
Currently when one interface changes its VRF, zebra will send these messages to
all daemons in *order*:
    1) `ZEBRA_INTERFACE_DELETE` ( notify them delete from old VRF )
    2) `ZEBRA_INTERFACE_VRF_UPDATE` ( notify them move from old to new VRF )
    3) `ZEBRA_INTERFACE_ADD` ( notify them added into new VRF )

When daemons deal with `VRF_UPDATE`, they use
`zebra_interface_vrf_update_read()->if_lookup_by_name()`
to check the interface exist or not in old VRF. This check will always return
*NULL* because `DELETE` ( deleted from old VRF ) is already done, so can't
find this interface in old VRF.

Send `VRF_UPDATE` is redundant and unuseful. `DELETE` and `ADD` are enough,
they will deal with RB tree, so don't send this `VRF_UPDATE` message when
vrf changes.

Since all daemons have good mechanism to deal with changing vrf, and don't
use this `VRF_UPDATE` mechanism.  So, it is safe to completely remove
all the code with `VRF_UPDATE`.

Signed-off-by: anlan_cs <anlan_cs@tom.com>
2023-10-07 10:06:39 +08:00
Russ White
8e755a03a3
Merge pull request #12649 from louis-6wind/bgp-link-state
bgpd: add basic support of BGP Link-State RFC7752
2023-09-26 10:07:02 -04:00
Dmytro Shytyi
f20cf1457d bgpd,lib,sharpd,zebra: srv6 introduce multiple segs/SIDs in nexthop
Append zebra and lib to use muliple SRv6 segs SIDs, and keep one
seg SID for bgpd and sharpd.

Note: bgpd and sharpd compilation relies on the lib and zebra files,
i.e if we separate this: lib or zebra or bgpd or sharpd in different
commits - this will not compile.

Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
2023-09-20 15:07:15 +02:00
Louis Scalbert
39fb34275f bgpd: do not announce link-state routes to zebra
Link-state prefixes are only intended to be read for a link-state
consumer (i.e. a controler). They cannot be installed in Forwarding
Information Base (FIB).

Do not announce them to zebra.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2023-09-18 15:06:07 +02:00
Donald Sharp
d2ba78929f bgpd: bgp_fsm_change_status/BGP_TIMER_ON and BGP_EVENT_ADD
Modify bgp_fsm_change_status to be connection oriented and
also make the BGP_TIMER_ON and BGP_EVENT_ADD macros connection
oriented as well.  Attempt to make peer_xfer_conn a bit more
understandable because, frankly it was/is confusing.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-09-10 08:31:25 -04:00
Donald Sharp
7b1158b169 bgpd: peer_established should be connection oriented
The peer_established function should be connection oriented.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-09-10 08:31:25 -04:00
Donatas Abraitis
c4f761d8ea
Merge pull request #14282 from pguibert6WIND/fix_redistribute_table_flush
bgpd: fix redistribute table command after bgp restarts
2023-08-31 12:41:30 +03:00
Philippe Guibert
82b11d8889 bgpd: fix redistribute table command after bgp restarts
When the BGP 'redistribute table' command is used for a given route
table, and BGP configuration is flushed and rebuilt, the redistribution
does not work.

Actually, when flushing the BGP configuration with the 'no router bgp'
command, the BGP redistribute entries related to the 'redistribute table'
entries are not flushed. Actually, at BGP deletion, the table number is
not given as parameter in bgp_redistribute_unset() function, and the
redistribution entry is not removed in zebra.
Fix this by adding some code to flush all the redistribute table
instances.

Fixes: 7c8ff89e93 ("Multi-Instance OSPF  Summary")

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-08-29 11:37:18 +02:00
Yuqing Zhao
6e7f305e54 bgpd: Convert from struct bgp_node to struct bgp_dest
This is based on @donaldsharp's work

The current code base is the struct bgp_node data structure.
The problem with this is that it creates a bunch of
extra data per route_node.
The table structure generates ‘holder’ nodes
that are never going to receive bgp routes,
and now the memory of those nodes is allocated
as if they are a full bgp_node.

After splitting up the bgp_node into bgp_dest and route_node,
the memory of ‘holder’ node which does not have any bgp data
will be allocated as the route_node, not the bgp_node,
and the memory usage is reduced.
The memory usage of BGP node will be reduced from 200B to 96B.
The total memory usage optimization of this part is ~16.00%.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>
2023-08-22 09:35:46 +08:00
Donatas Abraitis
0c7d6dfdf0
Merge pull request #14126 from LabNConsulting/ziemba-pbr-actions-mangling
pbrd: (3/3) add packet mangling actions (src/dst ip-addr/port, dscp, ecn)
2023-08-13 16:39:07 +03:00
Donatas Abraitis
456b63d8c8
Merge pull request #14099 from lkClare/formated_sync_0727
bgpd: bgp_path_info_extra memory optimization
2023-08-09 14:46:48 +03:00
G. Paul Ziemba
c47fd378f3 pbrd: add explicit 'family' field for rules
In the netlink-mediated kernel dataplane, each rule is stored
    in either an IPv4-specific database or an IPv6-specific database.
    PBRD opportunistically gleans each rule's address family value
    from its source or destination IP address match value (if either
    exists), or from its nexthop or nexthop-group (if it exists).

    The 'family' value is particularly needed for netlink during
    incremental rule deletion when none of the above fields remain set.

    Before now, this address family has been encoded by occult means
    in the (possibly otherwise unset) source/destination IP match
    fields in ZAPI and zebra.

    This commit documents the reasons for maintaining the 'family'
    field in the PBRD rule structure, adds a 'family' field in the
    common lib/pbr.h rule structure, and carries it explicitly in ZAPI.

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2023-08-08 10:18:22 -07:00
Valerian_He
98efa5bc6b bgpd: bgp_path_info_extra memory optimization
Even if some of the attributes in bgp_path_info_extra are
not used, their memory is still allocated every time. It
cause a waste of memory.
This commit code deletes all unnecessary attributes and
changes the optional attributes to pointer storage. Memory
will only be allocated when they are actually used. After
optimization, extra info related memory is reduced by about
half(~400B -> ~200B).

Signed-off-by: Valerian_He <1826906282@qq.com>
2023-08-08 10:48:07 +00:00
Donald Sharp
052debc3ee bgpd: Have bgp notice the zebra ability to use v6_with_v4_nexthops
Store the data.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-03 08:25:20 -04:00
mobash-rasool
49f0484113
Merge pull request #14064 from donaldsharp/pim_cleanup
Cleanup from examining gcov runs
2023-07-26 21:33:29 +05:30
Donald Sharp
cc66dff0a3 bgpd: Cleanup bgp_zebra_announce_default to be cleaner
Over time the bgp_zebra_announce_default function has gotten
slightly convoluted, clean it up so it's easier to read

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-21 07:31:04 -04:00
G. Paul Ziemba
580a98b798 lib: zapi PBR common encode/decode
bgpd, pbrd: use common pbr encoder
    zebra: use common pbr decoder
    tests: pbr_topo1: check more filter fields

    Purpose:
	1. Reduce likelihood of zapi format mismatches when adding
	   PBR fields due to multiple parallel encoder implementations
	2. Encourage common PBR structure usage among various daemons
	3. Reduce coding errors via explicit per-field enable flags

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2023-07-20 08:10:45 -07:00
G. Paul Ziemba
dbade07e0e pbrd: add vlan filters pcp/vlan-id/vlan-flags; ip-protocol any (zapi)
Subset: ZAPI changes to send the new data

    Also adds filter_bm field; currently for PBR_FILTER_PCP, but in the
    future to be used for all of the filter fields.

    Changes by:
	Josh Werner <joshuawerner@mitre.org>
	Eli Baum <ebaum@mitre.org>
	G. Paul Ziemba <paulz@labn.net>

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2023-07-19 08:14:49 -07:00
Donald Sharp
1e0b6a601e bgpd: Fix table manager to use the synchronous client
bgp_zebra_tm_connect calls bgp_zebra_get_table_range which
just used the global zclient.  Which of course still had
us exposing the global zclient to read and drop important
data from zebra.  This fixes commit 787c61e03c

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-10 10:47:17 -04:00
Donatas Abraitis
9a0bb7bcd1
Merge pull request #13333 from donaldsharp/vrf_bitmap_cleanup
*: Rearrange vrf_bitmap_X api to reduce memory footprint
2023-07-04 22:11:11 +03:00
Mark Stapp
d8f0a8eb47
Merge pull request #13851 from opensourcerouting/fix/use_zclient_sync_for_table_manager
bgpd: Use synchronous Zebra client for table manager
2023-06-27 08:54:46 -04:00
Donatas Abraitis
4199f032e5
Merge pull request #13722 from fdumontet6WIND/color_extcomm
bgpd,lib,yang: add colored extended communities support
2023-06-27 13:03:22 +03:00
Donatas Abraitis
edf6d1917c bgpd: Guard zlog_debug for table manager when the connection is successful
We shouldn't use unguarded zlog_debug().

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-27 09:32:07 +03:00
Donatas Abraitis
ec3d30f55d bgpd: Use zlog_err when can't connect to table manager (zebra)
If this an error, we should use zlog_err, not zlog_info as this is literally
not an information, but an error.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-27 09:29:52 +03:00
Donald Sharp
161972c9fe *: Rearrange vrf_bitmap_X api to reduce memory footprint
When running all daemons with config for most of them, FRR has
sharpd@janelle:~/frr$ vtysh -c "show debug hashtable"  | grep "VRF BIT HASH" | wc -l
3570

3570 hashes for bitmaps associated with the vrf.  This is a very
large number of hashes.  Let's do two things:

a) Reduce the created size of the actually created hashes to 2
instead of 32.

b) Delay generation of the hash *until* a set operation happens.
As that no hash directly implies a unset value if/when checked.

This reduces the number of hashes to 61 in my setup for normal
operation.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-26 14:59:21 -04:00
Donatas Abraitis
787c61e03c bgpd: Use synchronous Zebra client for table manager
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-26 17:43:40 +03:00
Francois Dumontet
442e2edcfa bgpd: add functions related to srte_color management
Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
2023-06-26 14:27:27 +02:00
Donatas Abraitis
257a0e0688 bgpd: Do not initialize global variable zclient_sync to NULL
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-20 20:50:40 +03:00
Donatas Abraitis
cf8a749934 bgpd: Reuse bgp_zebra_label_manager_ready() helper function
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-20 20:50:40 +03:00
Donatas Abraitis
2b768c5295 bgpd: Retry connecting to synchronouse label manager if not ready
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-20 20:50:38 +03:00
Donatas Abraitis
0043ebab99 bgpd: Use synchronous way to get labels from Zebra
Both the label manager and table manager zapi code send data requests via zapi
to zebra and then immediately listen for a response from zebra. The problem here
is of course that the listen part is throwing away any zapi command that is not
the one it is looking for.

ISIS/OSPF and PIM all have synchronous abilities via zapi, which they all
do through a special zapi connection to zebra. BGP needs to follow this model
as well. Additionally the new zclient_sync connection that should be created,
a once a second timer should wake up and read any data on the socket to
prevent problems too much data accumulating in the socket.

```
r3# sh bgp labelpool summary
Labelpool Summary
-----------------
Ledger:       3
InUse:        3
Requests:     0
LabelChunks:  1
Pending:      128
Reconnects:   1
r3# sh bgp labelpool inuse
Prefix                Label
---------------------------
10.0.0.1/32           16
192.168.31.0/24       17
192.168.32.0/24       18
r3#
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-20 20:50:10 +03:00
Russ White
68da3eab07
Merge pull request #13524 from pguibert6WIND/mpls_vpn_lsr_redistribute
MPLS vpn LSR redistribute
2023-06-20 09:13:33 -04:00
Philippe Guibert
27f4deed0a bgpd: update the mpls entry to handle return traffic
When advertising an mpls vpn entry with a new label,
the return traffic is redirected to the local machine,
but the MPLS traffic is dropped.

Add an MPLS entry to handle MPLS packets which have
the new label value. Traffic is swapped to the original
label value from the mpls vpn next-hop entry; then it is
sent to the resolved next-hop of the original next-hop
from the mpls vpn next-hop entry.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-06-16 10:54:58 +02:00
Trey Aspelund
465d3e356d bgpd: track L3VNI VTEP-IPs in tip_hash
For whatever reason, we were only updating tip_hash when we processed an
L2VNI add/del. This adds tip_hash updates to the L3VNI add/del codepaths
so that their VTEP-IPs are also used when when considering martian
addresses, e.g. bgp_nexthop_self().

Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
2023-05-30 15:20:35 +00:00
Philippe Guibert
1c6aa043ef bgpd: use nexthop interface when adding LSP in BGP MPLSVPN
BGP MPLSVPN next hop label allocation was using only the next-hop
IP address. As MPLSVPN contexts rely on bnc contexts, the real
nexthop interface is known, and the LSP entry to enter can apply
to the specific interface. To illustrate, the BGP service is able
to handle the following two iproute2 commands:

 > ip -f mpls route add 105 via inet 192.0.2.45 dev r1-eth1
 > ip -f mpls route add 105 via inet 192.0.2.46 dev r1-eth2

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-05-09 21:00:57 +02:00
Philippe Guibert
577be36a41 bgpd: add support for l3vpn per-nexthop label
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.

The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.

The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.

By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.

The change also include the following:

-  ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.

 > r1# show running-config
 > [..]
 > vrf vrf1
 >  ip route 172.31.0.30/32 172.31.0.20
 > r1# show bgp vrf vrf1 nexthop
 > [..]
 > 172.31.0.20 valid [IGP metric 0], #paths 1
 >  gate 192.0.2.11
 >  gate 192.0.2.12
 >  Last update: Mon Jan 16 09:27:09 2023
 >  Paths:
 >    1/1 172.31.0.30/32 VRF vrf1 flags 0x20018

To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.

- recursive route case

Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.

To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.

- network declared prefixes

Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-05-09 21:00:57 +02:00
Donatas Abraitis
786e2b8bdb Revert "MPLS allocation mode per next hop"
Broken tests, let's revert now.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-05-03 13:52:46 +03:00
Donatas Abraitis
99a1ab0b21
Merge pull request #12646 from pguibert6WIND/mpls_alloc_per_nh
MPLS allocation mode per next hop
2023-05-02 18:36:45 +03:00
Jafar Al-Gharaibeh
277eb2e580
Merge pull request #13060 from opensourcerouting/feature/allow_peering_with_127.0.0.1
bgpd: Allow peering via 127.0.0.0/8
2023-03-31 00:14:27 -05:00
Donatas Abraitis
c4e3d5569f
Merge pull request #13086 from donaldsharp/suppress_fib_pending
bgpd: Ensure suppress-fib-pending works with network statements
2023-03-27 21:55:58 +03:00
Donald Sharp
24a58196dd *: Convert event.h to frrevent.h
We should probably prevent any type of namespace collision
with something else.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
cd9d053741 *: Convert struct event_master to struct event_loop
Let's find a better name for it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00