Commit Graph

7496 Commits

Author SHA1 Message Date
Russ White
17a0a625f0
Merge pull request #15284 from opensourcerouting/feature/bgpd_announce_rpki_state_knob
bgpd: Add neighbor X send-community extended rpki command
2024-02-13 09:35:10 -05:00
Donatas Abraitis
26faf341ef
Merge pull request #15352 from louis-6wind/fix-leak-recursive
bgpd: fix route recursion on leaked routes
2024-02-12 21:42:03 +02:00
Donald Sharp
9800590ccc
Merge pull request #15346 from opensourcerouting/fix/memory_optimizations
Some more memory optimizations
2024-02-11 21:33:44 -05:00
Louis Scalbert
59a544c39b bgpd: fix route recursion on leaked routes
Leaked recursive routes are not resolved.

> VRF r1-cust1:
> B>  5.1.0.0/24 [200/98] via 99.0.0.1 (recursive), weight 1, 00:00:08
>  *                       via 192.168.1.2, r1-eth4, weight 1, 00:00:08
> B>* 99.0.0.1/32 [200/0] via 192.168.1.2, r1-eth4, weight 1, 00:00:08

> VRF r1-cust4:
> B   5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) inactive, weight 1, 00:00:08
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:08

When announcing the routes to zebra, use the peer of the ultimate bgp
path info instead of the one of the first parent path info to determine
whether the route is recursive.

The result is:
> VRF r1-cust4:
> B>  5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) (recursive), weight 1, 00:00:02
>   *                      via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-02-09 18:25:01 +01:00
Alexander Skorichenko
f4da4398f4 bgpd: fix minttl copying during peer reset
Include gtsm_hops (minttl) field when copying peer structure,
so that a new connection could set a proper value.

Signed-off-by: Alexander Skorichenko <askorichenko@netgate.com>
2024-02-09 16:58:52 +01:00
Donatas Abraitis
4dccc31884 bgpd: Optimize memory for peer_connection struct
```
struct peer_connection {
	struct peer *              peer;                 /*     0     8 */
	enum bgp_fsm_status        status;               /*     8     4 */
	enum bgp_fsm_status        ostatus;              /*    12     4 */
	int                        fd;                   /*    16     4 */
	uint32_t                   thread_flags;         /*    20     4 */
	pthread_mutex_t            io_mtx;               /*    24    40 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct stream_fifo *       ibuf;                 /*    64     8 */
	struct stream_fifo *       obuf;                 /*    72     8 */
	struct ringbuf *           ibuf_work;            /*    80     8 */
	struct event *             t_read;               /*    88     8 */
	struct event *             t_write;              /*    96     8 */
	struct event *             t_connect;            /*   104     8 */
	struct event *             t_delayopen;          /*   112     8 */
	struct event *             t_start;              /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	struct event *             t_holdtime;           /*   128     8 */
	struct event *             t_connect_check_r;    /*   136     8 */
	struct event *             t_connect_check_w;    /*   144     8 */
	struct event *             t_gr_restart;         /*   152     8 */
	struct event *             t_gr_stale;           /*   160     8 */
	struct event *             t_generate_updgrp_packets; /*   168     8 */
	struct event *             t_pmax_restart;       /*   176     8 */
	struct event *             t_routeadv;           /*   184     8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct event *             t_process_packet;     /*   192     8 */
	struct event *             t_process_packet_error; /*   200     8 */
	union sockunion            su;                   /*   208   128 */

	/* size: 336, cachelines: 6, members: 25 */
	/* last cacheline: 16 bytes */
};   /* saved 8 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-09 17:29:35 +02:00
Donatas Abraitis
c6e5b8030f bgpd: Optimize memory for bgp_nexthop_cache struct
```
struct bgp_nexthop_cache {
	afi_t                      afi;                  /*     0     4 */
	ifindex_t                  ifindex_ipv6_ll;      /*     4     4 */
	struct bgp_nexthop_cache_item entry;             /*     8    32 */
	uint32_t                   metric;               /*    40     4 */
	uint8_t                    nexthop_num;          /*    44     1 */
	_Bool                      is_evpn_gwip_nexthop; /*    45     1 */
	uint16_t                   change_flags;         /*    46     2 */
	struct nexthop *           nexthop;              /*    48     8 */
	time_t                     last_update;          /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	uint16_t                   flags;                /*    64     2 */

	/* XXX 2 bytes hole, try to pack */

	uint32_t                   srte_color;           /*    68     4 */
	struct bgp_nexthop_cache_head * tree;            /*    72     8 */
	struct prefix              prefix __attribute__((__aligned__(8))); /*    80    56 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	void *                     nht_info;             /*   136     8 */
	struct path_list           paths;                /*   144     8 */
	unsigned int               path_count;           /*   152     4 */

	/* XXX 4 bytes hole, try to pack */

	struct bgp *               bgp;                  /*   160     8 */

	/* size: 168, cachelines: 3, members: 17 */
	/* sum members: 162, holes: 2, sum holes: 6 */
	/* forced alignments: 1 */
	/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));   /* saved 16 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-09 17:25:05 +02:00
Donatas Abraitis
d13abf1180 bgpd: Optimize memory for ecommunity struct
```
struct ecommunity {
	long unsigned int          refcnt;               /*     0     8 */
	uint8_t                    unit_size;            /*     8     1 */
	_Bool                      disable_ieee_floating; /*     9     1 */

	/* XXX 2 bytes hole, try to pack */

	uint32_t                   size;                 /*    12     4 */
	uint8_t *                  val;                  /*    16     8 */
	char *                     str;                  /*    24     8 */

	/* size: 32, cachelines: 1, members: 6 */
	/* sum members: 30, holes: 1, sum holes: 2 */
	/* last cacheline: 32 bytes */
};   /* saved 8 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-09 17:21:23 +02:00
Donatas Abraitis
1fce318efc bgpd: Optimize memory for bgp_adj_out struct
```
struct bgp_adj_out {
	struct rb_entry            adj_entry;            /*     0    32 */

	/* XXX last struct has 4 bytes of padding */

	struct update_subgroup *   subgroup;             /*    32     8 */
	struct {
		struct bgp_adj_out * tqe_next;           /*    40     8 */
		struct bgp_adj_out * * tqe_prev;         /*    48     8 */
	} subgrp_adj_train;                              /*    40    16 */
	struct bgp_dest *          dest;                 /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	uint32_t                   addpath_tx_id;        /*    64     4 */
	uint32_t                   attr_hash;            /*    68     4 */
	struct attr *              attr;                 /*    72     8 */
	struct bgp_advertise *     adv;                  /*    80     8 */

	/* size: 88, cachelines: 2, members: 8 */
	/* paddings: 1, sum paddings: 4 */
	/* last cacheline: 24 bytes */
};   /* saved 8 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-09 17:18:22 +02:00
Donald Sharp
afa07a7f3a
Merge pull request #15255 from louis-6wind/bgp-leak-interface
bgpd: fix interface of routes leaked from another VRF
2024-02-08 11:18:58 -05:00
Philippe Guibert
ec6e09c271 bgpd: fix flushing ipv6 flowspec entries when peering stops
When a BGP flowspec peering stops, the BGP RIB entries for IPv6
flowspec entries are removed, but not the ZEBRA RIB IPv6 entries.

Actually, when calling bgp_zebra_withdraw() function call, only
the AFI_IP parameter is passed to the bgp_pbr_update_entry() function
in charge of the Flowspec add/delete in zebra. Fix this by passing
the AFI parameter to the bgp_zebra_withdraw() function.

Note that using topotest does not show up the problem as the
flowspec driver code is not present and was refused. Without that,
routes are not installed, and can not be uninstalled.

Fixes: 529efa2346 ("bgpd: allow flowspec entries to be announced to zebra")
Link: https://github.com/FRRouting/frr/pull/2025

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-02-07 23:01:25 +01:00
Donatas Abraitis
4d7975ee59 bgpd: Add neighbor X send-community extended rpki command
By default, iBGP and eBGP-OAD peers exchange RPKI extended community by default.

Add a command to disable sending RPKI extended community if needed.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-07 22:35:21 +02:00
Donald Sharp
7fe05d6185
Merge pull request #15314 from opensourcerouting/fix/remove_bgp_evpn_attr_get_df_pref
bgpd: A couple random EVPN findings
2024-02-07 07:44:07 -05:00
Donald Sharp
1bc2fa3584
Merge pull request #15305 from louis-6wind/label-dead-code
bgpd: remove dead label code in bgp_update
2024-02-06 14:50:56 -05:00
Donald Sharp
a791deff91
Merge pull request #15311 from louis-6wind/fix-show-srv6-sid
bgpd: fix displaying srv6 sid
2024-02-06 11:40:14 -05:00
Louis Scalbert
0603626184 bgpd: remove dead label code in bgp_update
No need to init new_attr. It is not used until it is overridden.

> new_attr = *attr;

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-02-06 13:30:14 +01:00
Donatas Abraitis
bd7bad9121 bgpd: Drop unused function bgp_evpn_attr_get_df_pref()
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-06 14:02:51 +02:00
Louis Scalbert
e2138a634d bgpd: fix displaying srv6 sid
98efa5bc6b ("bgpd: bgp_path_info_extra memory optimization") has removed
SID info from the extra structure.

Do not test for extra presence.

Fixes: 98efa5bc6b ("bgpd: bgp_path_info_extra memory optimization")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-02-06 11:21:21 +01:00
Donatas Abraitis
c8acc6709c bgpd: Send dynamic capability when on/off FQDN capability
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-05 12:23:53 +02:00
Donatas Abraitis
04e2401d20 bgpd: Do not reset the session if turning on/off FQDN capability
Allow BGP dynamic capabilities handle this gracefully.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-05 10:43:50 +02:00
Donatas Abraitis
3e99dcc626 bgpd: Send FQDN capability via dynamic capability if enabled
Since we have a knob to disable sending FQDN capability, it MUST be checked
before sending it using dynamic capabilities.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-05 08:22:46 +02:00
Donald Sharp
5bc104120b bgpd: Prevent rpki from hooking multiple times into rcu code
As far as I can tell, the rpki code creates a pthread that
is used to handle the i/o associated with talking to the
remote rpki server.  The problem that we are having is that
the rpki code in FRR wants to behave like FRR code and use
the zlog_XXX functions.  These functions all depend on
the RCU code.  Which is a bit picky( and rightly so!!! )
about being started up properly and shut down properly.

This commit is fixing the problem of shutdown.  From
playing with the rpki code, I was able to experimentally
determine that the rpki_create_socket callback function
can be called multiple times per pthread.  Additionally
I was able to clearly see multiple *different* pthreads
actually be created.  This leaves the possiblity
that each time it is called it might be hooking into the
RCU code.  Which makes the rcu code unhappy on shutdown.

Let's address the issue by checking to see if this pthread
has already hooked into the RCU code or not.  If so
then don't do this again.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-02-03 23:10:19 -05:00
Donald Sharp
3a6c3129dd
Merge pull request #15265 from louis-6wind/fix-rpki-logs
bgpd,lib: fix logging from rpki_create_socket()
2024-02-03 08:40:27 -05:00
Donatas Abraitis
8629700bc8
Merge pull request #15192 from fdumontet6WIND/capa_nego
bgpd: add [no]neighbor capability fqdn
2024-02-03 12:19:53 +02:00
Francois Dumontet
e146ea53ef bgpd: add [no]neighbor capability fqdn command
cisco routers are not dealing fairly whith unsupported capabilities.
When a cisco router receive an unsupported capabilities it reset the
negociation without notifying the unmatching capability as described in
RFC2842.
Cisco suggest the use of
neighbor x.x.x.x capability fqdn
to avoid the use of fqdn in open message.

this new command is to remove the use of fqdn capability in the
open message with the peer "x.x.x.x".

Link: https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/116189-problemsolution-technology-00.pdf

Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
2024-02-02 11:31:47 +01:00
Louis Scalbert
fdaf08bb46 bgpd: fix logging from rpki_create_socket()
Fix the following crash when logging from rpki_create_socket():

> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f6e21723798 in core_handler (signo=6, siginfo=0x7f6e1e502ef0, context=0x7f6e1e502dc0) at lib/sigevent.c:248
> #2  <signal handler called>
> #3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #4  0x00007f6e2144e537 in __GI_abort () at abort.c:79
> #5  0x00007f6e2176348e in _zlog_assert_failed (xref=0x7f6e2180c920 <_xref.16>, extra=0x0) at lib/zlog.c:670
> #6  0x00007f6e216b1eda in rcu_read_lock () at lib/frrcu.c:294
> #7  0x00007f6e21762da8 in vzlog_notls (xref=0x0, prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed", ap=0x7f6e1e504248) at lib/zlog.c:425
> #8  0x00007f6e217632fb in vzlogx (xref=0x0, prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed", ap=0x7f6e1e504248) at lib/zlog.c:627
> #9  0x00007f6e217621f5 in zlog (prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed") at lib/zlog.h:73
> #10 0x00007f6e21763596 in _zlog_assert_failed (xref=0x7f6e2180c920 <_xref.16>, extra=0x0) at lib/zlog.c:687
> #11 0x00007f6e216b1eda in rcu_read_lock () at lib/frrcu.c:294
> #12 0x00007f6e21762da8 in vzlog_notls (xref=0x7f6e21a50040 <_xref.68>, prio=4, fmt=0x7f6e21a4999f "getaddrinfo: debug", ap=0x7f6e1e504878) at lib/zlog.c:425
> #13 0x00007f6e217632fb in vzlogx (xref=0x7f6e21a50040 <_xref.68>, prio=4, fmt=0x7f6e21a4999f "getaddrinfo: debug", ap=0x7f6e1e504878) at lib/zlog.c:627
> #14 0x00007f6e21a3f774 in zlog_ref (xref=0x7f6e21a50040 <_xref.68>, fmt=0x7f6e21a4999f "getaddrinfo: debug") at ./lib/zlog.h:84
> #15 0x00007f6e21a451b2 in rpki_create_socket (_cache=0x55729149cc30) at bgpd/bgp_rpki.c:1337
> #16 0x00007f6e2120e7b7 in tr_tcp_open (tr_socket=0x5572914d1520) at rtrlib/rtrlib/transport/tcp/tcp_transport.c:111
> #17 0x00007f6e2120e212 in tr_open (socket=0x5572914b5e00) at rtrlib/rtrlib/transport/transport.c:16
> #18 0x00007f6e2120faa2 in rtr_fsm_start (rtr_socket=0x557290e17180) at rtrlib/rtrlib/rtr/rtr.c:130
> #19 0x00007f6e218b7ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
> #20 0x00007f6e21527a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

rpki_create_socket() is a hook function called from the rtrlib library.
The issue arises because rtrlib initiates its own separate pthread in which
it runs the hook, which does not establish an FRR RCU context. Consequently,
this leads to failures in the logging mechanism that relies on RCU.

Initialize a new FRR pthread context from the rtrlib pthread with a
valid RCU context to allow logging from the rpki_create_socket() and
dependent functions.

Link: https://github.com/FRRouting/frr/issues/15260
Fixes: a951752d4a ("bgpd: create cache server socket in vrf")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-02-02 10:35:10 +01:00
Donald Sharp
a4f222292b
Merge pull request #15275 from opensourcerouting/fix/one_more_memory_optimization_attr_struct
bgpd: One more attr struct memory optimization
2024-02-01 20:46:09 -05:00
Donald Sharp
9d8fd14b56
Merge pull request #15276 from mjstapp/port_registry
*: create a single registry of daemons' default port values
2024-02-01 16:07:11 -05:00
Donald Sharp
62443d7f66
Merge pull request #15264 from opensourcerouting/fix/memory_optimization
bgpd: Optimize memory for rd_ip struct
2024-02-01 14:55:18 -05:00
Mark Stapp
72b31b96fc *: create a single registry of daemons' default port values
Create a single registry of default port values that daemons
are using. Most of these are vty ports, but there are some
others for features like ospfapi and zebra FPM.

Signed-off-by: Mark Stapp <mjs@labn.net>
2024-02-01 11:40:02 -05:00
Donatas Abraitis
0223b98c5c bgpd: One more attr struct memory optimization
```
struct attr {
	struct aspath *            aspath;               /*     0     8 */
	struct community *         community;            /*     8     8 */
	long unsigned int          refcnt;               /*    16     8 */
	_uint64_t                  flag;                 /*    24     8 */
	struct in_addr             nexthop;              /*    32     4 */
	uint32_t                   med;                  /*    36     4 */
	uint32_t                   local_pref;           /*    40     4 */
	ifindex_t                  nh_ifindex;           /*    44     4 */
	uint8_t                    nh_flags;             /*    48     1 */
	uint8_t                    origin;               /*    49     1 */
	uint8_t                    es_flags;             /*    50     1 */
	uint8_t                    router_flag;          /*    51     1 */
	uint8_t                    distance;             /*    52     1 */
	uint8_t                    df_alg;               /*    53     1 */
	uint16_t                   df_pref;              /*    54     2 */
	enum pta_type              pmsi_tnl_type;        /*    56     4 */
	uint32_t                   rmap_change_flags;    /*    60     4 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct in6_addr            mp_nexthop_global;    /*    64    16 */
	struct in6_addr            mp_nexthop_local;     /*    80    16 */
	ifindex_t                  nh_lla_ifindex;       /*    96     4 */
	mpls_label_t               label;                /*   100     4 */
	struct ecommunity *        ecommunity;           /*   104     8 */
	struct ecommunity *        ipv6_ecommunity;      /*   112     8 */
	struct lcommunity *        lcommunity;           /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	struct cluster_list *      cluster1;             /*   128     8 */
	struct transit *           transit;              /*   136     8 */
	struct in_addr             mp_nexthop_global_in; /*   144     4 */
	struct in_addr             aggregator_addr;      /*   148     4 */
	struct in_addr             originator_id;        /*   152     4 */
	uint32_t                   weight;               /*   156     4 */
	as_t                       aggregator_as;        /*   160     4 */
	uint8_t                    mp_nexthop_len;       /*   164     1 */
	uint8_t                    mp_nexthop_prefer_global; /*   165     1 */
	uint8_t                    sticky;               /*   166     1 */
	uint8_t                    default_gw;           /*   167     1 */
	route_tag_t                tag;                  /*   168     4 */
	uint32_t                   label_index;          /*   172     4 */
	struct bgp_attr_srv6_vpn * srv6_vpn;             /*   176     8 */
	struct bgp_attr_srv6_l3vpn * srv6_l3vpn;         /*   184     8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct bgp_attr_encap_subtlv * encap_subtlvs;    /*   192     8 */
	struct bgp_attr_encap_subtlv * vnc_subtlvs;      /*   200     8 */
	struct bgp_route_evpn      evpn_overlay;         /*   208    36 */
	uint32_t                   mm_seqnum;            /*   244     4 */
	uint32_t                   mm_sync_seqnum;       /*   248     4 */
	struct ethaddr             rmac;                 /*   252     6 */
	/* --- cacheline 4 boundary (256 bytes) was 2 bytes ago --- */
	uint16_t                   encap_tunneltype;     /*   258     2 */
	uint32_t                   rmap_table_id;        /*   260     4 */
	uint32_t                   link_bw;              /*   264     4 */
	esi_t                      esi;                  /*   268    10 */

	/* XXX 2 bytes hole, try to pack */

	uint32_t                   srte_color;           /*   280     4 */
	enum nexthop_types_t       nh_type;              /*   284     4 */
	enum blackhole_type        bh_type;              /*   288     4 */
	uint32_t                   otc;                  /*   292     4 */
	_uint64_t                  aigp_metric;          /*   296     8 */

	/* size: 304, cachelines: 5, members: 54 */
	/* sum members: 302, holes: 1, sum holes: 2 */
	/* last cacheline: 48 bytes */
};   /* saved 8 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-01 17:28:42 +02:00
Donald Sharp
bb1e1265aa bgpd: Save memory when using bgp_path_info_extra and vnc
Structure size of bgp_path_info_extra when compiled
with vnc is 184 bytes.  Reduce this size to 72 bytes
when compiled w/ vnc but not necessarily turned
on vnc.

With 2 full bgp feeds this saves aproximately 100mb
when compiling with vnc and not using vnc.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-02-01 07:54:35 -05:00
Louis Scalbert
db7cf73a33 bgpd: fix interface on leaks from redistribute connected
In the target VRF's Routing Information Base (RIB), routes that are
leaked and originate from the 'redistribute connected' command have
their next-hop interface set as the interface from the source VRF.
This prevents the IP address of the connected interface from being
reachable from the target VRF.

> router bgp 5227 vrf r1-cust1
>  address-family ipv4 unicast
>   redistribute connected
>   rd vpn export 10:1
>   rt vpn import 52:100
>   rt vpn export 52:101
>   export vpn
>   import vpn
>  exit-address-family
> exit
> !
> router bgp 5227 vrf r1-cust4
>  address-family ipv4 unicast
>   network 192.0.2.0/24
>   rd vpn export 10:1
>   rt vpn import 52:101
>   rt vpn export 52:100
>   export vpn
>   import vpn
>  exit-address-family
> exit
> !
> vrf r1-cust1
>  ip route 192.0.2.0/24 r1-cust4 nexthop-vrf r1-cust4

Extract from the routing table:
> VRF r1-cust1:
> C>* 172.16.29.0/24 is directly connected, r1-eth4, 00:44:15
> S>* 192.0.2.0/24 [1/0] is directly connected, r1-cust4 (vrf r1-cust4), weight 1, 00:00:30
>
> VRF r1-cust4:
> B>* 172.16.29.0/24 [20/0] is directly connected, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02

In r1-cust4 VRF, the nexthop interface of 172.16.29.0/24 is r1-eth4,
which is unknown in the context. The following ping does not work:

> # tcpdump -lnni r1-cust1 'icmp' &
> # ip vrf exec r1-cust4 ping -c1 -I 192.0.2.1 172.16.29.1
> PING 172.16.29.1 (172.16.29.1) 56(84) bytes of data.
PING 172.16.29.1 (172.16.29.1) from 192.0.2.1 : 56(84) bytes of data.
18:49:20.635638 IP 192.0.2.1 > 172.16.29.1: ICMP echo request, id 15897, seq 1, length 64
18:49:27.113827 IP 192.0.2.1 > 192.0.2.1: ICMP host 172.16.29.1 unreachable, length 92

Fix the issue by setting nh_ifindex to the index of the VRF master
interface of the incoming BGP instance. The result is:

> VRF r1-cust4:
> C>* 192.0.2.0/24 is directly connected, r1-cust5, 00:27:40
> B>* 172.16.29.0/24 [20/0] is directly connected, r1-cust1 (vrf r1-cust1), weight 1, 00:00:08

> # tcpdump -lnni r1-cust1 'icmp' &
> # ping -c1 172.16.29.1 -I 192.0.2.1
> PING 172.16.29.1 (172.16.29.1) from 192.0.2.1 : 56(84) bytes of data.
> 18:48:32.506281 IP 192.0.2.1 > 172.16.29.1: ICMP echo request, id 15870, seq 1, length 64
> 64 bytes from 172.16.29.1: icmp_seq=1 ttl=64 time=0.050 ms
> 18:48:32.506304 IP 172.16.29.1 > 192.0.2.1: ICMP echo reply, id 15870, seq 1, length 64

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-02-01 10:21:43 +01:00
Louis Scalbert
067fbab4e4 bgpd: fix interface on leaks from network statement
Leaked routes from prefixes defined with 'network <prefix>' are inactive
because they have no valid nexthop interface.

> vrf r1-cust1
>  ip route 172.16.29.0/24 192.168.1.2
> router bgp 5227 vrf r1-cust1
>  no bgp network import-check
>  address-family ipv4 unicast
>   network 172.16.29.0/24
>   rd vpn export 10:1
>   rt vpn import 52:100
>   rt vpn export 52:101
>   export vpn
>   import vpn
>  exit-address-family
> exit
> !
> router bgp 5227 vrf r1-cust4
>  bgp router-id 192.168.1.1
> !
>  address-family ipv4 unicast
>   network 192.0.2/24
>   rd vpn export 10:1
>   rt vpn import 52:101
>   rt vpn export 52:100
>   export vpn
>   import vpn
>  exit-address-family
> exit

Extract from the routing table:

> VRF r1-cust1:
> S>* 172.16.29.0/24 [1/0] via 192.168.1.2, r1-eth4, weight 1, 00:47:53
>
> VRF r1-cust4:
> B   172.16.29.0/24 [20/0] is directly connected, unknown (vrf r1-cust1) inactive, weight 1, 00:03:40

Routes imported through the "network" command, as opposed to those
redistributed from the routing table, do not associate with any specific
interface.

When leaking prefix from other VRFs, if the route was imported from the
network statement (ie. static sub-type), set nh_ifindex to the index of
the VRF master interface of the incoming BGP instance.

The result is:

> VRF r1-cust4:
> B>* 172.16.29.0/24 [20/0] is directly connected, r1-cust1 (vrf r1-cust1), weight 1, 00:00:08

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-02-01 10:21:43 +01:00
Donatas Abraitis
bd3b17d27d
Merge pull request #15258 from louis-6wind/fix-adj-in-attr
bgpd: fix attr comparaison bgp_adj_in_set
2024-01-31 15:06:40 +02:00
Donatas Abraitis
0fd46e3f4e bgpd: Optimize memory for rd_ip struct
```
struct rd_ip {
	uint16_t                   type;                 /*     0     2 */
	uint16_t                   val;                  /*     2     2 */
	struct in_addr             ip;                   /*     4     4 */

	/* size: 8, cachelines: 1, members: 3 */
	/* last cacheline: 8 bytes */
};   /* saved 4 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-01-31 11:29:30 +02:00
Russ White
61aa468a04
Merge pull request #15257 from opensourcerouting/fix/reinstall_aggregate_route_if_rmap
bgpd: Reinstall aggregated routes if using route-maps and it was changed
2024-01-30 15:08:08 -05:00
Donald Sharp
d633a81dbf
Merge pull request #15250 from opensourcerouting/fix/memory_optimizations
bgpd: Some memory optimizations
2024-01-30 10:56:35 -05:00
Louis Scalbert
5c0aab103d bgpd: fix attr comparaison bgp_adj_in_set
In bgp_adj_in_set(), attr has not yet been interned. adj->attr is always
different from attr. adj->attr is always uninterned and interned even if
attr and adj->attr are identical.

Fix the comparison.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-01-30 15:24:32 +01:00
Donatas Abraitis
ee1986f1b5 bgpd: Reinstall aggregated routes if using route-maps and it was changed
Without this change when we change the route-map, we never reinstall the route
if the route-map has changed.

We checked only some attributes like aspath, communities, large-communities,
extended-communities, but ignoring the rest of attributes.

With this change, let's check if the route-map has changed.

bgp_route_map_process_update() is triggered on route-map change, and we set
`changed` to true, which treats aggregated route as not the same as it was before.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-01-30 15:47:49 +02:00
Donatas Abraitis
48856741bd bgpd: Optimize memory usage for bgp_damp_config struct
```
struct bgp_damp_config {
        unsigned int               suppress_value;       /*     0     4 */
        unsigned int               reuse_limit;          /*     4     4 */
        time_t                     max_suppress_time;    /*     8     8 */
        time_t                     half_life;            /*    16     8 */
        unsigned int               reuse_list_size;      /*    24     4 */
        unsigned int               reuse_index_size;     /*    28     4 */
        unsigned int               ceiling;              /*    32     4 */
        unsigned int               decay_rate_per_tick;  /*    36     4 */
        unsigned int               decay_array_size;     /*    40     4 */
        unsigned int               reuse_scale_factor;   /*    44     4 */
        double                     scale_factor;         /*    48     8 */
        double *                   decay_array;          /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        int *                      reuse_index;          /*    64     8 */
        struct bgp_damp_info * *   reuse_list;           /*    72     8 */
        int                        reuse_offset;         /*    80     4 */
        safi_t                     safi;                 /*    84     4 */
        struct bgp_damp_info *     no_reuse_list;        /*    88     8 */
        struct event *             t_reuse;              /*    96     8 */
        afi_t                      afi;                  /*   104     4 */

        /* size: 112, cachelines: 2, members: 19 */
        /* padding: 4 */
        /* last cacheline: 48 bytes */
};   /* saved 8 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-01-30 08:12:59 +02:00
Donatas Abraitis
a699cc1796 bgpd: Optimize memory usage for bgp_aggregate struct
```
struct bgp_aggregate {
        uint8_t                    summary_only;         /*     0     1 */
        uint8_t                    as_set;               /*     1     1 */
        uint8_t                    origin;               /*     2     1 */
        _Bool                      med_mismatched;       /*     3     1 */
        _Bool                      med_initialized;      /*     4     1 */
        _Bool                      match_med;            /*     5     1 */

        /* XXX 2 bytes hole, try to pack */

        struct {
                char *             name;                 /*     8     8 */
                struct route_map * map;                  /*    16     8 */
        } rmap;                                          /*     8    16 */
        long unsigned int          count;                /*    24     8 */
        long unsigned int          incomplete_origin_count; /*    32     8 */
        long unsigned int          egp_origin_count;     /*    40     8 */
        struct hash *              community_hash;       /*    48     8 */
        struct hash *              ecommunity_hash;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct hash *              lcommunity_hash;      /*    64     8 */
        struct hash *              aspath_hash;          /*    72     8 */
        struct community *         community;            /*    80     8 */
        struct ecommunity *        ecommunity;           /*    88     8 */
        struct lcommunity *        lcommunity;           /*    96     8 */
        struct aspath *            aspath;               /*   104     8 */
        safi_t                     safi;                 /*   112     4 */
        uint32_t                   med_matched_value;    /*   116     4 */
        char *                     suppress_map_name;    /*   120     8 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        struct route_map *         suppress_map;         /*   128     8 */

        /* size: 136, cachelines: 3, members: 22 */
        /* sum members: 134, holes: 1, sum holes: 2 */
        /* last cacheline: 8 bytes */
};
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-01-30 08:12:59 +02:00
Donatas Abraitis
0804038916 bgpd: Optimize memory usage for bgp_static struct
```
struct bgp_static {
        int                        backdoor;             /*     0     4 */
        uint32_t                   label_index;          /*     4     4 */
        uint8_t                    valid;                /*     8     1 */

        /* XXX 1 byte hole, try to pack */

        uint16_t                   encap_tunneltype;     /*    10     2 */
        uint32_t                   igpmetric;            /*    12     4 */
        struct in_addr             igpnexthop;           /*    16     4 */
        uint32_t                   atomic;               /*    20     4 */
        struct {
                char *             name;                 /*    24     8 */
                struct route_map * map;                  /*    32     8 */
        } rmap;                                          /*    24    16 */
        struct prefix_rd           prd __attribute__((__aligned__(8))); /*    40    16 */
        char *                     prd_pretty;           /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        mpls_label_t               label;                /*    64     4 */

        /* XXX 4 bytes hole, try to pack */

        esi_t *                    eth_s_id;             /*    72     8 */
        struct ethaddr *           router_mac;           /*    80     8 */
        struct prefix              gatewayIp __attribute__((__aligned__(8))); /*    88    56 */

        /* size: 144, cachelines: 3, members: 14 */
        /* sum members: 139, holes: 2, sum holes: 5 */
        /* forced alignments: 2 */
        /* last cacheline: 16 bytes */
} __attribute__((__aligned__(8)));   /* saved 8 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-01-30 08:12:59 +02:00
Donatas Abraitis
4821e7a0d8 bgpd: Optimize memory usage for attr struct
```
struct attr {
	struct aspath *            aspath;               /*     0     8 */
	struct community *         community;            /*     8     8 */
	long unsigned int          refcnt;               /*    16     8 */
	_uint64_t                  flag;                 /*    24     8 */
	struct in_addr             nexthop;              /*    32     4 */
	uint32_t                   med;                  /*    36     4 */
	uint32_t                   local_pref;           /*    40     4 */
	ifindex_t                  nh_ifindex;           /*    44     4 */
	uint8_t                    origin;               /*    48     1 */
	uint8_t                    es_flags;             /*    49     1 */
	uint8_t                    router_flag;          /*    50     1 */
	uint8_t                    default_gw;           /*    51     1 */
	enum pta_type              pmsi_tnl_type;        /*    52     4 */
	uint32_t                   rmap_change_flags;    /*    56     4 */
	struct in6_addr            mp_nexthop_global;    /*    60    16 */
	/* --- cacheline 1 boundary (64 bytes) was 12 bytes ago --- */
	struct in6_addr            mp_nexthop_local;     /*    76    16 */
	ifindex_t                  nh_lla_ifindex;       /*    92     4 */
	struct ecommunity *        ecommunity;           /*    96     8 */
	struct ecommunity *        ipv6_ecommunity;      /*   104     8 */
	struct lcommunity *        lcommunity;           /*   112     8 */
	struct cluster_list *      cluster1;             /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	struct transit *           transit;              /*   128     8 */
	struct in_addr             mp_nexthop_global_in; /*   136     4 */
	struct in_addr             aggregator_addr;      /*   140     4 */
	struct in_addr             originator_id;        /*   144     4 */
	uint32_t                   weight;               /*   148     4 */
	as_t                       aggregator_as;        /*   152     4 */
	uint8_t                    mp_nexthop_len;       /*   156     1 */
	uint8_t                    mp_nexthop_prefer_global; /*   157     1 */
	uint8_t                    sticky;               /*   158     1 */
	uint8_t                    distance;             /*   159     1 */
	uint16_t                   encap_tunneltype;     /*   160     2 */
	uint8_t                    df_alg;               /*   162     1 */

	/* XXX 1 byte hole, try to pack */

	route_tag_t                tag;                  /*   164     4 */
	uint32_t                   label_index;          /*   168     4 */
	mpls_label_t               label;                /*   172     4 */
	struct bgp_attr_srv6_vpn * srv6_vpn;             /*   176     8 */
	struct bgp_attr_srv6_l3vpn * srv6_l3vpn;         /*   184     8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct bgp_attr_encap_subtlv * encap_subtlvs;    /*   192     8 */
	struct bgp_attr_encap_subtlv * vnc_subtlvs;      /*   200     8 */
	struct bgp_route_evpn      evpn_overlay;         /*   208    36 */
	uint32_t                   mm_seqnum;            /*   244     4 */
	uint32_t                   mm_sync_seqnum;       /*   248     4 */
	struct ethaddr             rmac;                 /*   252     6 */
	/* --- cacheline 4 boundary (256 bytes) was 2 bytes ago --- */
	uint16_t                   df_pref;              /*   258     2 */
	uint32_t                   rmap_table_id;        /*   260     4 */
	uint32_t                   link_bw;              /*   264     4 */
	esi_t                      esi;                  /*   268    10 */

	/* XXX 2 bytes hole, try to pack */

	uint32_t                   srte_color;           /*   280     4 */
	uint32_t                   otc;                  /*   284     4 */
	enum nexthop_types_t       nh_type;              /*   288     4 */
	enum blackhole_type        bh_type;              /*   292     4 */
	_uint64_t                  aigp_metric;          /*   296     8 */

	/* size: 304, cachelines: 5, members: 53 */
	/* sum members: 301, holes: 2, sum holes: 3 */
	/* last cacheline: 48 bytes */
};   /* saved 16 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-01-30 08:12:58 +02:00
Donatas Abraitis
89e124f042 bgpd: Optimize memory usage for bgp_nlri struct
```
struct bgp_nlri {
	uint16_t                   afi;                  /*     0     2 */
	uint8_t                    safi;                 /*     2     1 */

	/* XXX 1 byte hole, try to pack */

	bgp_size_t                 length;               /*     4     2 */

	/* XXX 2 bytes hole, try to pack */

	uint8_t *                  nlri;                 /*     8     8 */

	/* size: 16, cachelines: 1, members: 4 */
	/* sum members: 13, holes: 2, sum holes: 3 */
	/* last cacheline: 16 bytes */
};   /* saved 8 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-01-29 15:37:23 +02:00
Donatas Abraitis
7133cce196 bgpd: Optimize memory usage for bgp_notify struct
```
struct bgp_notify {
	uint8_t                    code;                 /*     0     1 */
	uint8_t                    subcode;              /*     1     1 */
	bgp_size_t                 length;               /*     2     2 */
	_Bool                      hard_reset;           /*     4     1 */

	/* XXX 3 bytes hole, try to pack */

	char *                     data;                 /*     8     8 */
	uint8_t *                  raw_data;             /*    16     8 */

	/* size: 24, cachelines: 1, members: 6 */
	/* sum members: 21, holes: 1, sum holes: 3 */
	/* last cacheline: 24 bytes */
};   /* saved 16 bytes! */
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-01-29 15:29:25 +02:00
Louis Scalbert
14e51be394 bgpd: fix VRF leaking with 'network import-check' (4/4)
The following configuration creates an infinite routing leaking loop
because 'rt vpn both' parameters are the same in both VRFs.

> router bgp 5227 vrf r1-cust4
>    no bgp network import-check
>    bgp router-id 192.168.1.1
>    address-family ipv4 unicast
>      network 28.0.0.0/24
>      rd vpn export 10:12
>      rt vpn both 52:100
>      import vpn
>      export vpn
>    exit-address-family
> !
> router bgp 5227 vrf r1-cust5
>    no bgp network import-check
>    bgp router id 192.168.1.1
>    address-family ipv4 unicast
>      network 29.0.0.0/24
>      rd vpn export 10:13
>      rt vpn both 52:100
>      import vpn
>      export vpn
>    exit-address-family

The previous commit has added a routing leak update when a nexthop
update is received from zebra. It indirectly calls
bgp_find_or_add_nexthop() in which a static route triggers a nexthop
cache entry registration that triggers a nexthop update from zebra.

Do not register again the nexthop cache entry if the BGP_STATIC_ROUTE is
already set.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-01-29 10:47:00 +01:00
Louis Scalbert
879bfc01c8 bgpd: fix VRF leaking with 'network import-check' (3/4)
If 'bgp network import-check' is defined on the source BGP session,
prefixes that are defined with the network command cannot be leaked to
the other VRFs BGP table even if they are present in the origin VRF RIB
if the 'rt import' statement is defined after the 'network <prefix>'
ones.

When a prefix nexthop is updated, update the prefix route leaking. The
current state of nexthop validation is now stored in the attributes of
the bgp path info. Attributes are compared with the previous ones at
route leaking update so that a nexthop validation change now triggers
the update of destination VRF BGP table.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-01-29 10:46:43 +01:00
Louis Scalbert
bb71bc02fd bgpd: fix VRF leaking with 'network import-check' (2/4)
"if not XX else" statements are confusing.

Replace two "if not XX else" statements by "if XX else" to prepare next
commits. The patch is only cosmetic.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-01-29 10:30:37 +01:00
Louis Scalbert
6de0cd9bdf bgpd: fix VRF leaking with 'network import-check' (1/4)
If 'bgp network import-check' is defined on the source BGP session,
prefixes that are defined with the network command cannot be leaked to
the other VRFs BGP table even if they are present in the origin VRF RIB.

Always validate the nexthop of BGP static routes (i.e. defined with the
network statement) if 'network import-check' is defined on the source
BGP session and the prefix is present in source RIB.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-01-29 10:30:37 +01:00