Refine check on whether a route can be injected into EVPN to allow
EVPN-sourced routes to be injected back into another instance.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
The check on which routes are exportable is a superset, so remove the
route sub-type checks. Also, this change is needed to handle EVPN-imported
leaked routes correctly.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
IPv4 or IPv6 unicast routes which are imported from EVPN routes
(type-2 or type-5) and installed in a BGP instance and then leaked
do not need any nexthop tracking, as any tracking should happen in
the source instance.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
A non-imported route or a non-VPN imported route is a candidate to be
exported into the VPN routing table for leaking to other BGP instances
or advertisement into BGP/MPLS VPN. The former is a local or learnt
IPv4 or IPv6 route. The latter is an IPv4 or IPv6 route that is based
on a received EVPN type-2 or type-5 route.
Implement a function to specify if a route can be exported into VPN
and use in the appropriate places.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
IPv4 or IPv6 unicast routes which are imported from EVPN routes
(type-2 or type-5) and installed in a BGP instance can be leaked
to another instance.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface. Howver, in the model that
is supported in the implementation and commonly deployed, there is no
explicit Overlay IP address associated with the next hop in the tenant
VRF; the underlay IP is used if (since) the forwarding plane requires
a next hop IP. Therefore, the next hop has to be explicit flagged as
onlink to cause any next hop reachability checks in the forwarding plane
to be skipped.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
Use existing mechanism to specify the nexthops as onlink when installing
these routes from bgpd to zebra and get rid of a special flag that was
introduced for EVPN-sourced routes. Also, use the onlink flag during next
hop validation in zebra and eliminate other special checks.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
Use the L3 interface exchanged between zebra and bgp in route install.
This patch in conjunction with the earlier one helps to eliminate some
special code in zebra to derive the next hop's interface.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
The implementation currently derives this L3 interface for EVPN tenant
routes using special code that looks at route flags. This patch
exchanges the L3 interface between zebra and bgpd as part of the L3-VNI
exchange in order to eliminate some this special code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
When a IPv4 or IPv6 route that was formerly allowed by the route-map
to be injected into EVPN gets an updated set of attributes that now
causes it to be filtered, the route needs to be pulled out of EVPN.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Withdraw flag is not sufficient to call bgp_update vs. bgp_withdraw()
processing for a given BGP evpn update message.
When a bgp update needs to be treated as an implicit withdraw
(e.g., due to malformed attribute), the code wasn't handling
things properly.
Rearranging attribute pass field to type-5 route processing and aligning
similar to done for other routes (type2/type-3).
Ticket:CM-24003
Reviewed By:CCR-8330
Testing Done:
Singed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
A few code paths weren't handling the vpnv6 nexthop lenghts as
expected, which was leading to problems like imported vpnv6 routes
not being marked as valid when they should. Fix this.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
BGP IPv6 routes should never contain the NEXT_HOP attribute
(MP_REACH_NLRI should be used instead).
This reverts commit 75cd35c697.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
peer_flag_modify() will always return BGP_ERR_INVALID_FLAG because
the action was not defined for PEER_FLAG_IFPEER_V6ONLY flag.
```
global PEER_FLAG_IFPEER_V6ONLY = 16384;
global BGP_ERR_INVALID_FLAG = -2;
probe process("/usr/lib/frr/bgpd").statement("peer_flag_modify@/root/frr/bgpd/bgpd.c:3975")
{
if ($flag == PEER_FLAG_IFPEER_V6ONLY && $action->type == 0)
printf("action not found for the flag PEER_FLAG_IFPEER_V6ONLY\n");
}
probe process("/usr/lib/frr/bgpd").function("peer_flag_modify").return
{
if ($return == BGP_ERR_INVALID_FLAG)
printf("return BGP_ERR_INVALID_FLAG\n");
}
```
produces:
action not found for the flag PEER_FLAG_IFPEER_V6ONLY
return BGP_ERR_INVALID_FLAG
$ vtysh -c 'conf t' -c 'router bgp 20' -c 'neighbor eth1 interface v6only remote-as external'
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Display only ipv4 neighbors when 'show bgp ipv4 neighbors' command is issued.
Display only ipv6 neighbors when 'show bgp ipv6 neighbors' command is issued.
Take the address family of the peer address into account, while displaying the neighbors.
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
The community_delete and lcommunity_delete functionality was
creating a special string that needed to be specially parsed.
Remove all this string creation and just pass the pertinent
data into the appropriate functions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The struct prefix *prefix is really a const struct prefix *
This was causing compile warns->errors on some compilers
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In order to iterate over MPLS VPN routes, it's necessary to use
two nested loops (the outer loop iterates over the MPLS VPN RDs,
and the inner loop iterates over the VPN routes from that RD).
The add-path code wasn't doing this, which was leading to lots of
crashes when add-path was enabled for the MPLS VPN SAFI. This patch
fixes the problem.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
If path->net is NULL in the bgp_path_info_free() function, then
bgpd would crash in bgp_addpath_free_info_data() with the following
backtrace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ff7b267a42a in __GI_abort () at abort.c:89
#2 0x00007ff7b39c1ca0 in core_handler (signo=11, siginfo=0x7ffff66414f0, context=<optimized out>) at lib/sigevent.c:249
#3 <signal handler called>
#4 idalloc_free_to_pool (pool_ptr=pool_ptr@entry=0x0, id=3) at lib/id_alloc.c:368
#5 0x0000560096246688 in bgp_addpath_free_info_data (d=d@entry=0x560098665468, nd=0x0) at bgpd/bgp_addpath.c:100
#6 0x00005600961bb522 in bgp_path_info_free (path=0x560098665400) at bgpd/bgp_route.c:252
#7 bgp_path_info_unlock (path=0x560098665400) at bgpd/bgp_route.c:276
#8 0x00005600961bb719 in bgp_path_info_reap (rn=rn@entry=0x5600986b2110, pi=pi@entry=0x560098665400) at bgpd/bgp_route.c:320
#9 0x00005600961bf4db in bgp_process_main_one (safi=SAFI_MPLS_VPN, afi=AFI_IP, rn=0x5600986b2110, bgp=0x560098587320) at bgpd/bgp_route.c:2476
#10 bgp_process_wq (wq=<optimized out>, data=0x56009869b8f0) at bgpd/bgp_route.c:2503
#11 0x00007ff7b39d5fcc in work_queue_run (thread=0x7ffff6641e10) at lib/workqueue.c:294
#12 0x00007ff7b39ce3b1 in thread_call (thread=thread@entry=0x7ffff6641e10) at lib/thread.c:1606
#13 0x00007ff7b39a3538 in frr_run (master=0x5600980795b0) at lib/libfrr.c:1011
#14 0x000056009618a5a3 in main (argc=3, argv=0x7ffff6642078) at bgpd/bgp_main.c:481
Add a null-check protection to fix this problem.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
For VRF route leak, enable route map filter based
on "source-vrf" check.
Implemented match filter rule for "source-vrf" which
compares leaked routes original vrf_id (where it leaked from)
during importing into target VRF.
Ticket:CM-23776
Reviewed By:
Testing Done:
Configure vrf route leak from vrf1 to vrf2,
configure import vrf under vrf2 along with route-map
with source-vrf filter.
Add and remove source-vrf filter and checked routes
were added and removed to vrf2 table via vpn (default) table.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
"When building without VNC, automake sees that the `bgpd_bgpd_CFLAGS`
variable exists, although it is only set in the VNC-enabled case... but
since the variable exists, it unconditionally drops `AM_CFLAGS` for the
two bgp targets and uses `bgpd_bgpd_CFLAGS` instead, which will
contain... _nothing_."
This was breaking builds of bgpd binaries with MSAN enabled.
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Consider the following topo VTEP1->SPINE1->VTEP2. ebgp is being used
for evpn route exchange with SPINE just acting as a pass through.
1. VTEP1 was building the type-3 IMET route with the correct PMSI
tunnel type (ingress-replication) and label (VNI)
2. Spine1 was however only parsing the tunnel-type in the attr (was
skipping parsing of the label field altogether) -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@MSP1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
Advertised to non peer-group peers:
TORC11(downlink-1) TORC12(downlink-2) TORC21(downlink-3) TORC22(downlink-4) TORS1(downlink-5) TORS2(downlink-6)
Route [3]:[0]:[32]:[27.0.0.15]
5550
27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
Origin IGP, valid, external, bestpath-from-AS 5550, best
Extended Community: RT:5550:1003 ET:8
AddPath ID: RX 0, TX 227
Last update: Thu Feb 7 15:44:22 2019
PMSI Tunnel Type: Ingress Replication, label: 16777213 >>>>>>>
Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@MSP1:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3. So VTEP2 didn't rx the correct label.
In an all FRR setup this doesn't have any functional consequence but some
vendors are validating the content of the label field as well and ignoring
the IMET route from FRR (say VTEP1 is FRR and VTEP2 is 3rd-party). The
functional consequence of this VTEP2 ignores VTEP1's IMET route and doesn't
add VTEP1 to the corresponding l2-vni flood list.
This commit fixes up the PMSI attr parsing on spine-1 -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@MSP1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
Advertised to non peer-group peers:
TORC11(downlink-1) TORC12(downlink-2) TORC21(downlink-3) TORC22(downlink-4) TORS1(downlink-5) TORS2(downlink-6)
Route [3]:[0]:[32]:[27.0.0.15]
5550
27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
Origin IGP, valid, external, bestpath-from-AS 5550, best
Extended Community: RT:5550:1003 ET:8
AddPath ID: RX 0, TX 278
Last update: Thu Feb 7 00:17:40 2019
PMSI Tunnel Type: Ingress Replication, label: 1003 >>>>>>>>>>>
Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@MSP1:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Ticket: CM-23790
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Currently we are hardcoding it at the time of attr building to
ingress-replication. This is just a code clean-up and has no
functional impact.
Ticket: CM-23790
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Commit b4897fa5 introduced a call to decrement a route-map counter,
applied to a prefix-list in bgp_rfapi_cfg.c. This commit removes
that call.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
there are some cases where the bgp deletion will not be complete, while
the vrf identifier of the bgp instance is not completely identified. The
vrf search based on the bgp name is the better protection, since the bgp
vrf instance is created, even if the vrf identifier is not yet known.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Made changes and updated the routemap applied counter in the following flows.
1.Increment when route map attached to a list.
2.Decrement when route map removed / modified from a list.
3.Increment/decrement when route map create/delete callback triggered.
4.Besides ,This counter need not be updated when a route map is got updated.
i.e changing/adding a match value to the existing routemap.
In BGP , same update api called for all three add/delete/update operation .
But this counter have to be updated only for routemap addition.
Addressed this specific change by identifying the routemap operation based
on routemap pointer.
Signed-off-by: RajeshGirada <rgirada@vmware.com>
- some target_CFLAGS that needed to include AM_CFLAGS didn't do so
- libyang/sysrepo/sqlite3/confd CFLAGS + LIBS weren't used at all
- consistently use $(FOO_CFLAGS) instead of @FOO_CFLAGS@
- 2 dependencies were missing for clippy
Signed-off-by: David Lamparter <equinox@diac24.net>
the maximum value for stalepath timer is extended to 4095 to align with
bgp restart timer value.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rfc of bgp graceful restart mechanism permits to increase the
restart timer, since its value is encoded on 12 bit.
So make available the possibility to extend it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Executed some evpn related tests with valgrind and saw some errors
related to uninitialized memory and overlapping memcpy. This commit
fixes those.
Ticket: CM-21218
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8249
the list of iprules is displayed in the 'show bgp ipv4 flowspec detail'
The list of iprules is displayed, only if it is installed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
two kind of rules are being set from bgp flowspec: ipset based rules,
and ip rule rules. default route rules may have a lower priority than
the other rules ( that do not support default rules). so, if an ipset
rule without fwmark is being requested, then priority is arbitrarily set
to 1. the other case, priority is set to 0.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
because ip rule creation is used to not only handle traffic marked by
fwmark; but also for conveying traffic with from/to rules, a check of
the creation must be done in the linked list of ip rules.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
adding/suppressing flowspec to pbr is supported. the add and the remove
code is being added. now,bgp supports the hash list of ip rule list.
The removal of bgp ip rule is done via search. The search uses the
action field. the reason is that when a pbr rule is added, to replace an
old one, the old one is kept until the new one is installed, so as to
avoid traffic to be cut. This is why at one moment, one can have two
same iprules with different actions. And this is why the algorithm
covers this case.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
now, ip rule can be created from two differnt ways; however a single
zebra API has been defined. so make it consistent by adding a parameter
to the bgp zebra layer. the function will handle the rest.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Before, it was not possible to create any rules. Now, it is possible to
have flowspec rules relying only on ip rule command. The check is done
here.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
that iprule list stands for the list of fs entries that are created,
based only on ip rule from/to rule.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
instead of using ipset based mechanism to forward packets, there are
cases where it is possible to use ip rule based mechanisms (without
ipset). Here, this applies to simple fs rules with only 'from any' or
'to any'.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
main bgp structure that contains fs information is being cleaned.
some fields are removed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
bgp instance is disabling the label allocated to reach vrf entity.
previously, only vrf disabling was removing the label. now, when bgp
leaves, bgp instance also frees the label used.
PR=62306
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Acked-by: Julien Floret <julien.floret@6wind.com>
When a interface based peer is setup and if it is part of a peer
group we should ignore this and just use the PEER_FLAG_CAPABILITY_ENHE
no matter what.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
bgp would crash with various `show bgp neighbor json` commands
based upon whether or not it did a pretty print of the output
or not. This is because we were freeing the data 2 times.
Cleanup so that we free the json data 1 time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When an inactive-neigh delete is rxed bgp will not have a local path to
remove (and re-run path selection). Instead it simply re-installs the
current best remote path if any.
Ticket: CM-23018
Testing Done: evpn-min
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Move the info filling for zebra mac-ip install (sent by bgpd) to a
common place.
The commit also fixes missing ROUTER flag for one of the cases
added in a code branch that doesn't have the ROUTER changes -
[
6d8c603a
bgpd: use IP address as tie breaker if the MM seq number is the same
]
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Route-map filtering is based on the value of
"bgp->adv_cmd_rmap[afi][safi].map". For example, we advertise routes in
bgp_evpn_advertise_type5_routes() based on the value of
"bgp->adv_cmd_rmap[afi][safi].map". This variable gets populated in vty
handler bgp_evpn_advertise_type5. This variable will not get populated
if we have not yet applied the route-map configuration. The fix is to
correctly populate "bgp->adv_cmd_rmap[afi][safi].map" in
bgp_route_map_process_update() if it has not been populated before.
Ticket: CM-23263
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8163
Problem reported that with certain sequences of defining the
remote-as on the peer-group and the members, the configuration would
become wrong, with configured remote-as settings not reflected in
the config but peers unable to come up. This fix resolves these
inconsistencies.
Ticket: CM-19560
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
when removing bgp instance, the parsing of rm->info contexts must be
protected. Also, the main level of hierarchy of rds must not be
allocated more than once.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Now that all daemons receive the VRF backend from zebra, we can get
rid of vrf_is_mapped_on_netns() in favor of using the more convenient
vrf_is_backend_netns() function, which doesn't require any argument.
This commit also fixes the following problem:
debian(config)# ip route 50.0.0.0/8 blackhole vrf FAKE table 2
% table param only available when running on netns-based vrfs
Even when zebra was started with the --vrfwnetns, the error
above would be displayed since the VRF FAKE didn't exist, which
would make vrf_is_mapped_on_netns() return 0 incorrectly. Using
vrf_is_backend_netns() this problem doesn't happen anymore.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
strlen is the same as sizeof when the memory is dynamically allocated
but it is not the same when the memory being looked at is an array.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The current invocation of frr_pthread_set_name was causing it reset the os_name.
There is no need for this, we now always create the pthread appropriately
to have both name and os_name. So convert this function to a simple
call through of the pthread call now.
Before(any of these changes):
sharpd@robot ~/frr1> ps -L -p 16895
PID LWP TTY TIME CMD
16895 16895 ? 00:01:39 bgpd
16895 16896 ? 00:00:54
16895 16897 ? 00:00:07 bgpd_ka
After:
sharpd@donna ~/frr1> ps -L -p 1752
PID LWP TTY TIME CMD
1752 1752 ? 00:00:00 bgpd
1752 1753 ? 00:00:00 bgpd_io
1752 1754 ? 00:00:00 bgpd_ka
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When using an `import vrf` mechanism we are marking
the vrf label as BGP_PREVENT_VRF_2_VRF_LEAK, and then sending
this down to zebra. Since zebra knows nothing about this special
value, convert it to a value that it does know MPLS_LABEL_NONE.
This bug was introduced by: 13b7e7f007
And shows up with this error message in the zebra log:
2019/01/09 08:25:16 ZEBRA: Extended Error: Label >= configured maximum in platform_labels
2019/01/09 08:25:16 ZEBRA: [EC 4043309093] netlink-cmd (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=8, pid=3321825991
2019/01/09 08:25:16 ZEBRA: [EC 4043309103] LSP Install Failure: 4294967294
And zebra kept the label as:
donna.cumulusnetworks.com# show mpls table
Inbound Outbound
Label Type Nexthop Label
-------- ------- --------------- --------
-2 BGP GREEN
-2 BGP BLUE
After this fix, neither the labels are stored in zebra nor do we see
the log error message.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Further refine the previous commit to store the hash value in
both the `struct community_list` as well as the `struct rmap_community`
structures. This allows us to know a priori what our hash value
is. This change cuts another couple of seconds of convergence
off to ~55 seconds and further reduces cpu load of bgp:
16 40061.706 433732 92 330102 129 1242965 RWTEX TOTAL
Down from ~43 seconds previously.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The community_list_lookup function in a situation where you have
a large number of communities and route-maps that reference them
becomes a very expensive operation( effectively a linked list walk
per route per route-map you apply per peer that has a routemap that
refereces a community, ecommunity or lcommunity. This is a very
expensive operation.
In my testbed, I have a full bgp feed that feeds into 14 namespace
view based bgp processes and finally those 14 feed into a final
namespace FRR instance that has route-maps applied to each
incoming peer for in and out:
!
router bgp 65033
bgp bestpath as-path multipath-relax
neighbor 192.168.41.1 remote-as external
neighbor 192.168.42.2 remote-as external
neighbor 192.168.43.3 remote-as external
neighbor 192.168.44.4 remote-as external
neighbor 192.168.45.5 remote-as external
neighbor 192.168.46.6 remote-as external
neighbor 192.168.47.7 remote-as external
neighbor 192.168.48.8 remote-as external
neighbor 192.168.49.9 remote-as external
neighbor 192.168.50.10 remote-as external
neighbor 192.168.51.11 remote-as external
neighbor 192.168.52.12 remote-as external
neighbor 192.168.53.13 remote-as external
neighbor 192.168.54.14 remote-as external
!
address-family ipv4 unicast
neighbor 192.168.42.2 prefix-list two-in in
neighbor 192.168.42.2 route-map two-in in
neighbor 192.168.42.2 route-map two-out out
neighbor 192.168.43.3 prefix-list three-in in
neighbor 192.168.43.3 route-map three-in in
neighbor 192.168.43.3 route-map three-out out
neighbor 192.168.44.4 prefix-list four-in in
neighbor 192.168.44.4 route-map four-in in
neighbor 192.168.44.4 route-map four-out out
neighbor 192.168.45.5 prefix-list five-in in
neighbor 192.168.45.5 route-map five-in in
neighbor 192.168.45.5 route-map five-out out
neighbor 192.168.46.6 prefix-list six-in in
neighbor 192.168.46.6 route-map six-in in
neighbor 192.168.46.6 route-map six-out out
neighbor 192.168.47.7 prefix-list seven-in in
neighbor 192.168.47.7 route-map seven-in in
neighbor 192.168.47.7 route-map seven-out out
neighbor 192.168.48.8 prefix-list eight-in in
neighbor 192.168.48.8 route-map eight-in in
neighbor 192.168.48.8 route-map eight-out out
neighbor 192.168.49.9 prefix-list nine-in in
neighbor 192.168.49.9 route-map nine-in in
neighbor 192.168.49.9 route-map nine-out out
neighbor 192.168.50.10 prefix-list ten-in in
neighbor 192.168.50.10 route-map ten-in in
neighbor 192.168.50.10 route-map ten-out out
neighbor 192.168.51.11 prefix-list eleven-in in
neighbor 192.168.51.11 route-map eleven-in in
neighbor 192.168.51.11 route-map eleven-out out
neighbor 192.168.52.12 prefix-list twelve-in in
neighbor 192.168.52.12 route-map twelve-in in
neighbor 192.168.52.12 route-map twelve-out out
neighbor 192.168.53.13 prefix-list thirteen-in in
neighbor 192.168.53.13 route-map thirteen-in in
neighbor 192.168.53.13 route-map thirteen-out out
neighbor 192.168.54.14 prefix-list fourteen-in in
neighbor 192.168.54.14 route-map fourteen-in in
neighbor 192.168.54.14 route-map fourteen-out out
exit-address-family
!
This configuration on my machine before this change takes about 2:45 to converge
and bgp takes:
Total thread statistics
16 151715.050 493440 307 3464919 335 7376696 RWTEX TOTAL
CPU time as reported by 'show thread cpu'.
After this change BGP takes 58 seconds to converge and uses:
Total thread statistics
-------------------------
16 42954.284 350319 122 295743 157 1194820 RWTEX TOTAL
almost 43 seconds of CPU time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The community_list_lookup function is being changed in a future
commit. As such we want to use the `struct rmap_community` data
structure for storing compiled information about communities,ecommunities
or lcommunities.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The End of Rib notification in BGP is useful to know no matter
the circumstances. So change this from a debug message to
an info and cleanup the message a bit and add vrf we are in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
in bgp, even if the main vrf implementation relies on tables, the fact
is some vrf implementation rely on network namespaces, and then the
table used is the default table from the network namespace. Use the
wording vrf instead of table.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Certain EVPN configuartions should only be applied
under DEFAULT VRF bgpd instance.
reject the cli for non default bgp instance
Ticket:CM-18950
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
redirect IP nh of flowspec entry is retrieved so that the nexthop
IP information is injected into the nexthop tracking, and is associated
to the bgp_path structure. This permits validating or unvalidating the
bgp_path for injection in zebra or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit is the last missing piece to complete BGP LU support in bgpd. To this moment, bgpd (and zebra) supported auto label assignment only for prefixes leaked from VRFs to vpn and for MPLS SR prefixes. This adds auto label assignment to other routes types in bgpd. The following enhancements have been made:
* bgp_route.c:bgp_process_main_one() now sets implicit-null local_label to all local, aggregate and redistributed routes.
* bgp_route.c:bgp_process_main_one() now will request a label from the label pool for any prefix that loses the label for some reason (for example, when the static label assignment config is removed)
* bgp_label.c:bgp_reg_dereg_for_label() now requests labels from label pool for routes which have no associated label index
* zebra_mpls.c:zebra_mpls_fec_register() now expects both label and label_index from the calling function, one of which must be set to MPLS_INVALID_LABEL or MPLS_INVALID_LABEL_INDEX, based on this it will decide how to register the provided FEC.
Signed-off-by: Anton Degtyarev <anton@cumulusnetworks.com>
These two commands previously required the whole original command but
we should allow the user to shorten out this since the data at the
end is not required to figure out what to delete.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ability to shorten the extended community commands for routemaps
upon removal should be allowed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow user to enter `no set community` to remove the community
set for the route-map.
Fixes: #3491
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Default vrf name has been changed in show route. Because the default vrf
name can be configured in zebra, the default vrf name in bgp is changed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
if default vrf is not Default, then nexthop vrf name returned may be
"Default", which is not the correct name of default vrf. change it
accordingly.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The confederation identifier is a `as_t` type which is a
uint32_t underneath the covers. Display it using a %u
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
cf. https://wiki.debian.org/NonFreeIETFDocuments
These MIBs were in our git purely for documentation purposes, they are
not installed and not needed for building SNMP support.
Signed-off-by: David Lamparter <equinox@diac24.net>
Looks like we missed some code in a non-normal compiled
code path for the bgp_path_info conversion.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add some code that will reject local mac's from
being installed and add some code that will cause
a rescan when we have a local mac change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
Add a bit of code that allows us to dump the mac hash. Future
commits will actually add entries to the mac hash and then operate
on it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When you have configured soft reconfiguration inbound
for evpn allow it to notice and send in the evpn data
as appropriate.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct bgp_route_evpn` and `struct overlay_index` data
structures are exactly the same. Reduce to 1.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
During L3VNI add delete, configured non-default
route-target is not replayed correctly.
Non-default route-target should only be deleted
during unconfiguring under bgp vrf instance,
during delete of l3vni only unmap from the VRF.
during addition of l3vni map back to the VRF
Ticket:CM-21482
Testing Done:
Bring up evpn configuration with L3vni up with
non-default route-target.
Perform delete/add of L3vni and validated non-default
route-target is mapped back to vrf.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Fully revert the rest of the e23b9ef6d2 commit as that it was breaking
route leaking between vrf's.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I bulk-fixed "recieved" as a misspelling in 0437e10... but didn't notice
there was a JSON value among these.
Signed-off-by: David Lamparter <equinox@diac24.net>
The adj_out data structure is a linked list of adjacencies
1 per update group. In a large scale env where we are
not using peer groups, this list lookup starts to become
rather costly. Convert to a better data structure for this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don't show the configuration line `rfp full-table-download off` by
default as it is not the default value, instead only show
`rfp full-table-download on` (the non-default value) when it is
configured.
This standardizes this knob to the FRR default behavior.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
EVPN route's attribute changes,
mark attribute change flag to imported unicast route.
A scenario where AS_PATH attribute have changed for an EVPN type-5
route, set attribute change
to imported route.
Ticket:CM-23008
Reviewed By:
Testing Done:
Validated via marking EVPN route with AS_PATH prepand.
At the receiving VTEP, ensure attribute change flag is set to
imported unicast route and bgp update sent to VTEPs subsequent
bgp peers with AS_PATH prepend update.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
1. "show bgp ipv4/ipv6 [json]"
2. "show bgp ipv4/ipv6 neighbor <peer> routes [json]"
3. "show bgp ipv4/ipv6 neighbors <peer> advertised-routes [json]"
In the above show commands, when a BGP path is displayed, we do not display the
local preference if it is EBGP route. Route calculation assumes the default
local preference. But, we can change the default local preference using
configuration in FRR. In this case, user should know the default local
preference value that is being used in the route calculation. Thus, adding a
new field 'default local preferece' in the show commands where a BGP path is
displayed.
When a BGP path is displayed in the above show commands, as-path does not
include the local AS. So, user has to execute another show command to display
the local-AS. To avoid this, adding a new field local-AS to above show commands.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
label pool finalisation must be delayed after route deletion on bgp.
otherwise a crash will happen, while labels will be released.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
previous change was to fix rnh module in Zebra for leaked routes
this reverts that fix, so probably reintroduces the problem.
Signed-off-by: Lou Berger <lberger@labn.net>
when converting bgp fs entries to bgp pbr entries, the fields of the
flowspec are analysed. In the case src ip or dst ip is set to 0.0.0.0,
that field is ignored, thus preventing from injecting a rule that can
not be injected into the pbr. This can be done by avoiding mentioning
the field in the bitmask structure used to convert data to pbr entries.
PR=61620
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Acked-by: Emmanuel Vize <emmanuel.vize@6wind.com>
that new option will overwrite simpson draft. There is a new ecommunity
option whose type is 0x1 and subtype is 0xc. That option is defined
here on iana.org/assignments/bgp-extended-communities page:
- bgp-extended-communities.xhtml#trans-ipv4
It contains the IP address to redirect traffic to. The understanding of
the draft is the following one:
- if that community is only present, then the ip contained inside will
be chosen as nexthop.
- if that community is provided along with simpson extended community,
then only the new redirect ip draft will be used. however, both will be
displayed.
- in other cases, if there is only the simpson extended community, then
the nexthop IP of the NLRI will be chosen.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When "default-originate ipv4" is configured, a type-5 route is installed in
the local node and advertised to the peer with auto-rd.
When the above was followed by configuring an RD in IP VRF, Type-5 are
generated for only the non-default routes.
Fixed this issue by withdrawing the default route with auto-rd and advertising
the route with confiured RD.
Signed-off-by: Kishore Aramalla karamalla@vmware.com
A while ago all FRR configuration commands were converted to use the
QOBJ infrastructure to keep track of configuration objects. This
means the configuration lock isn't necessary anymore because the
QOBJ code detects when someones tries to edit a configuration object
that was deleted and react accordingly (log an error and abort the
command). The possibility of accessing dangling pointers doesn't
exist anymore since vty->index was removed.
Summary of the changes:
* remove the configuration lock and the vty_config_lockless() function.
* rename vty_config_unlock() to vty_config_exit() since we need to
clean up a few things when exiting from the configuration mode.
* rename vty_config_lock() to vty_config_enter() to remove code
duplication that existed between the three different "configuration"
commands (terminal, private and exclusive).
Configuration commands converted to the new northbound model don't
need the configuration lock either since the northbound API also
detects when someone tries to edit a configuration object that
doesn't exist anymore.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When a VNI is unconfigured it deletes all of its import and export
route-targets. There is a export route-target link list and import
route-target linked list. There are redudant loops in the
route-target deletion code. In the first iteration it deleted the
route-target and freed the RT node, but not list node.
In the 2nd iteration it tries to free the RT node again, resulting in
the double free of RT node.
Signed-off-by: "Kishore Aramalla karamallavmware.com"
Null check of 'rn' returned by bgp_node_lookup() because it could be
deferenced afterwards into bgp_nexthop_get_node_info()
Signed-off-by: F. Aragon <paco@voltanet.io>
* The function bgp_router_id_zebra_bump() will check for active bgp
peers before chenging the router ID.
If there are established peers, router ID is not modified
which prevents the flapping of established peer connection
* Added field in bgp structure to store the count of established peers
Signed-off-by: kssoman <somanks@vmware.com>
Duplicate address detection configuration clis
under bgp l2vpn evpn config mode.
- Enabled/Disable (global knob) for feature.
- Configure cli for duplicate detection action
freeze and freze until time (auto-recovery).
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Enable/disable duplicate address detection
there are 3 actions
warning-only: Default action which generates
only frr warning (syslog) to user for any
duplicate detecton
freeze: Permanently freezes address, manual
intervene required.
freeze with time: An address will recover once
the time has expired (auto-recovery).
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The bgp_connected_set_node_info and bgp_connected_get_node_info
function names were slightly backwards lets fix them up
to bgp_node_set_bgp_connected_ref_info and bgp_node_get_bgp_connected_ref_info
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_distance_set_node_info and bgp_distance_get_node_info
function names were slightly backwards lets fix them up
to bgp_node_get_bgp_distance_info and bgp_node_set_bgp_distance_info
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_static_set_node_info and bgp_static_get_node_info
function names were slightly backwards rename to
bgp_node_get_bgp_static_info and bgp_node_set_bgp_static_info
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_aggregate_set_node_info and bgp_aggregate_get_node_info
functions names were slightly backwards, rename to
bgp_node_get_bgp_aggregate_info and bgp_node_set_bgp_aggregate_info
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_nexthop_set_node_info and bgp_nexthop_get_node_info
function names were slightly backwards, rename to bgp_node_set and get
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The show_adj_route_vpn function was incredibly hard to read because
of the incredibly deep indentation. fix this up some.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleanup the bgp_route_map_process_update code to be a bit
easier to read as that it approached the right side of the
80 column limit a whole bunch and became hard to read.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fix the missed usage of bgp_static_get_node_info and also
cleanup the function around it that was using it to make
it a bit more readable.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_info data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ordering of data within the `struct bgp_node`
was causing extra padding of data. Moving the version
to a bit different spot allows for more efficient packing
of data.
Pre-change:
(gdb) p sizeof(struct bgp_node)
$1 = 152
(gdb)
Post-change:
(gdb) p sizeof(struct bgp_node)
$1 = 144
(gdb)
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is no reason that bgp should be including zebra
headers into it's code base, it is a violation of
their respective name spaces.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Missing endline was resulting in garbled output in vtysh in some cases, for example, when there were no peers configured and the user has issued "bgp disable-ebgp-connected-route-check" command.
Signed-off-by: Anton Degtyarev <anton@cumulusnetworks.com>
In rare cases when the default BGP instance is instantiated after VRF bgp instances (see comment to bgp_mplsvpn.c:vpn_leak_postchange_all() for an example), the "router bgp" command needs to call vpn_leak_postchange_all() to start the route leaking process. The issue was it was never checked if the "router bgp" command was used to create the default BGP instance or just to enter into "router bgp" command context. This resulted in vpn_leak_postchange_all() executed every time (and vpn routes re-announced to all peers) when the user was entering "router bgp" command context.
Signed-off-by: Anton Degtyarev <anton@cumulusnetworks.com>
if zebra is not started, then vrf identifiers are not available. This
prevents import/exportation to be available. This commit permits having
import/export available, even when zebra is not started.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
It's been a year since we added the new optional parameters
to instantiation. Let's switch over to the new name.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The motivation for this patch is to address a concerning behavior of
tx-addpath-bestpath-per-AS. Prior to this patch, all paths' TX ID was
pre-determined as the path was received from a peer. However, this meant
that any time the path selected as best from an AS changed, bgpd had no
choice but to withdraw the previous best path, and advertise the new
best-path under a new TX ID. This could cause significant network
disruption, especially for the subset of prefixes coming from only one
AS that were also communicated over a bestpath-per-AS session.
The patch's general approach is best illustrated by
txaddpath_update_ids. After a bestpath run (required for best-per-AS to
know what will and will not be sent as addpaths) ID numbers will be
stripped from paths that no longer need to be sent, and held in a pool.
Then, paths that will be sent as addpaths and do not already have ID
numbers will allocate new ID numbers, pulling first from that pool.
Finally, anything left in the pool will be returned to the allocator.
In order for this to work, ID numbers had to be split by strategy. The
tx-addpath-All strategy would keep every ID number "in use" constantly,
preventing IDs from being transferred to different paths. Rather than
create two variables for ID, this patch create a more generic array that
will easily enable more addpath strategies to be implemented. The
previously described ID manipulations will happen per addpath strategy,
and will only be run for strategies that are enabled on at least one
peer.
Finally, the ID numbers are allocated from an allocator that tracks per
AFI/SAFI/Addpath Strategy which IDs are in use. Though it would be very
improbable, there was the possibility with the free-running counter
approach for rollover to cause two paths on the same prefix to get
assigned the same TX ID. As remote as the possibility is, we prefer to
not leave it to chance.
This ID re-use method is not perfect. In some cases you could still get
withdraw-then-add behaviors where not strictly necessary. In the case of
bestpath-per-AS this requires one AS to advertise a prefix for the first
time, then a second AS withdraws that prefix, all within the space of an
already pending MRAI timer. In those situations a withdraw-then-add is
more forgivable, and fixing it would probably require a much more
significant effort, as IDs would need to be moved to ADVs instead of
paths.
Signed-off-by Mitchell Skiba <mskiba@amazon.com>
When we have a late registration of the Extended Nexthop capability
for BGP and the peer already has nexthop information stored, go
through and enable RA on the important interfaces.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we attempt to register nexthops before we have the zebra
connection, they will not be installed. After we have noticed
that we are up, re-install them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow some debug notification when we are unable to talk
to zebra due to the connection not being there yet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This change is a fixup to -
7b5e18 - bgpd: use IP address as tie breaker if the MM seq number is the
same
And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.
If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP. If its own route is not the best one, it
will withdraw the route.
]
To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.
Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.
If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.
Ticket: CM-22674
Reviewed By: CCR-7937
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is needed to install the remote dst when a more preferred local
path is removed.
Ticket: CM-22685
Reviewed By: CCR-7936
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
local mac add/del comes from zebra. the hidden commands help verify
various race conditions between bgp and zebra.
Ticket: CM-22687
Reviewed By: CCR-7939
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Problems were reported with the name of the default vrf and the
default bgp instance being different, creating confusion. This
fix changes both to "default" for consistency.
Ticket: CM-21791
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-7658
Testing: manual testing and automated tests before pushing
Both of these are testing/demo-style tools that don't make sense as part
of a normal installation. So don't install them.
NB: this is only the executables, libospfclient and the RFP code are not
affected.
Signed-off-by: David Lamparter <equinox@diac24.net>
FRR_DAEMON_INFO should now contain an array of 'frr_yang_module_info'
structures describing the YANG modules implemented by the daemon.
This array will be used by frr_init() function to load all YANG modules
and initialize the northbound callbacks during the daemon initialization.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
* Cast when assigning should be to uint16_t
* Restored comment documenting strange behavior
* Further increased PREFIX_STRLEN to 80 chars
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The tx_id_buf was not being set to anything in some cases,
make sure it's a null string before using.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
community_free, lcommunity_free and ecommunity_free are similar type of functions. Most of the places, these three are called together. The signature of community_free is different from other two functions. Modified the community_free API signature to align with other two functions to avoid any confusion. There is no functionality impact with this and this is just to avoid any confusion.
Testing: manual testing and show commands
Signed-off-by: Sri Mohana Singamsetty msingamsetty@vmware.com
as prefix is opaque for flowspec, and json needs to have a non empty
full of meaning value in prefix, the proposal is to encode the
displayable form of flowspec entry.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
zclient->redist[afi][type] is a hash table and not an integer since a
while ago when VRF support was introduced. As such, zclient->redist[][]
should never be manipulated directly, the vrf_bitmap_*() helper functions
should be used instead. This fixes a few crashes found by the CLI fuzzer.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The routing table data structure can create intermediate route nodes
during its normal operation, so we always need to check if the 'info'
pointer of a route node is NULL or not before dereferencing it.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The vnc_direct_del_rn_group_rd() function can be called with the 'afi'
parameter set to AFI_L2VPN on some specific cases. Remove the assert to
fix the crash.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Other parts of the rfapi code check if the 'rfg->rfapi_import_table'
pointer is NULL or not before using it. Do the same here to fix a crash
detected by the CLI fuzzer.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The rfapiDeleteRemotePrefixesIt() function checks on several places if
'p' is NULL or not. Introduce an additional NULL check to prevent a
crash from happening.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The rfapi code wasn't checking if strtoul() succeeded or not when parsing
the list of labels. Fix the affected commands by not allowing the user
to enter a non-numeric input.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Other parts of the rfapi code also check if these pointers are NULL or
not before using them.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The peer->group pointer is set only if the PEER_STATUS_GROUP flag is
set in the peer. Add a protection to prevent a NULL pointer dereference.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Like community_cmp() and ecommunity_cmp(), the lcommunity_cmp() function
also needs to handle NULL pointers for correct operation.
Without this fix, bgpd can crash when entering the following commands:
vtysh -c "configure terminal" -c "ip large-community-list standard WORD deny"
vtysh -c "configure terminal" -c "no ip large-community-list expanded WORD"
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The bgp_damp_config_clean() function was deallocating some arrays without
resetting the variables that represent their sizes. This was leading to
some crashes because other parts of the code iterate over these arrays
by looking at their corresponding sizes, which could be invalid.
Fixes the following segfaults (which only happen under certain
circumstances):
vtysh -c "configure terminal" -c "router bgp 1" -c "bgp dampening"
vtysh -c "configure terminal" -c "router bgp 1" -c "no bgp dampening"
vtysh -c "configure terminal" -c "router bgp 1" -c "no bgp dampening 45"
vtysh -c "" -c "clear ip bgp dampening"
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Recent changes to the nht code in bgp caused us to actually
keep a true count of v6 nexthop paths when using v4 over v6.
This change introduced a race condition on shutdown on who
got to the bnc cache first( the v4 table or not ). Effectively
we were allowing the continued existence of the path->nexthop
pointing to the freed bnc. This was especially true when
we had route leaking. So when we free the bnc make sure
we clean up the path->nexthop variables pointing at it too.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If you are using bgp unnumbered( or interface based peers )
when we detect an error give the user a bit more of a clue
what they may have done wrong.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add the ability to aggregate routes to handle
extended communities. Make the actions similiar
to what we do for normal communities.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The evpn_vtep_ip_cmp function must return positive and negative
numbers for when we are doing sorted linked list inserts.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The purpose of adding a l2vni as an sorted list is
shot in the foot when the l2vni compare function only
returns 0 or 1. This will cause subtle crashes when
we add sorted and we end up with multiple list node pointing
to the same thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add the '[no] flood <disable|head-end-replication>' command
to the l2vpn evpn afi/safi sub commands for bgp. This command
when entered as 'flood disable' will turn off type 3 route
generation for the transmittal of the type 3 route necessary
for BUM replication on the remote VTEP. Additionally it will
turn off the BUM handling via the new zebra command,
ZEBRA_VXLAN_FLOOD_CONTROL.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow the modification of whether or not we will allow
BUM flooding on the vxlan bridge. To do this allow
the upper level protocol to specify via the ZEBRA_VXLAN_FLOOD_CONTROL
zapi message.
If flooding is disabled then BUM traffic will not be forwarded
to other VTEP's.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. "show bgp ipv4 json"
- Added "network" field which displays a prefix in 'prefix/prefixlen' format.
2. "show bgp ipv6 json"
- Added "network" field which displays a prefix in 'prefix/prefixlen' format.
- JSON does not have "prefix", "prefixLen" fields which are present in IPv4
command. Added these fields as they are useful.
3. "show bgp ipv4/ipv6 neighbor <neighbor_addr> advertised-routes json"
- Added "network" field.
4. "show bgp ipv4/ipv6 summary json"
- Added "pfxSnt" for peers. This count is obtained from corresponding
update_subgroup.
5. "show bgp neighbor json"
- Added "sentPrefixCounter"
Signed-off-by: Ameya Dharkar <adharkar@vmware.org>
When executing vpn route-map config for importation, the running-config
records vrf import route-map instead. Actually, this is a problem when
restarting configuring when using vpn route-map. The choice is done to
move to vrf format, when at least one import list is created for vrfs.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a subgroup splits to form a new subgroup because of policy changes
for a peer, new subgroup copies adj out(state about advertised routes)
from the parent subgroup. At the same time, it should also copy
scount(advertised prefix count) to the new subgroup for the count to be
in sync with the adj_out for the subgroup.
Signed-off-by: Ameya Dharkar <adharkar@vmware.org>
When cleaning up a interface string, from the linked list we were
dropping the name pointer which held the allocated martian address
intf string.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Track the memory type associated with the bartian address
interface a bit better, instead of using MTYPE_TMP.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Do a straight conversion of `struct bgp_info` to `struct bgp_path_info`.
This commit will setup the rename of variables as well.
This is being done because `struct bgp_info` is not descriptive
of what this data actually is. It is path information for routes
that we keep to build the actual routes nexthops plus some extra
information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we add/remove peers we need to do a bit better job
of tracking them in the bgp->peerhash.
1) When we have the doppelganger take over, make sure the
winner is the one represented in the peerhash.
2) When creating the doppelganger, leave the current one
in place instead of blindly replacing it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleanup calls where we were passing in the su for
peer creation a tiny bit.
Creating a peer from the cli will always have a conf_if *or*
a su but not both. While a doppelganger will have both.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp->peerhash is a secretive bit of data that we use
to quickly lookup data about peers. Unfortunately
since we had not way to look at it, we had no way
of knowing if it had gotten in or out of sync.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. "show bgp ipv4 json"
- Corresponding CLI has "network" field which displays a prefix in
'prefix/prefixlen' format. Added this "network" field to JSON as well.
- Following fields have different names in JSON and CLI.
CLI JSON
metric med
locPrf localPref
path aspath
Added fields "metric", "locPrf" and "path" in JSON for CLI/JSON
consistency. Older JSON fields med, localPref, aspath will be
deprecated in future.
2. "show bgp ipv6 json"
- Similar changes as "show bgp ipv4 json"
- JSON does not have "prefix", "prefixLen" fields which are present in IPv4
command. Added these fields as they are useful.
3. "show bgp ipv4/ipv6 neighbor <neighbor_addr> advertised-routes json"
- Added "network" field.
- Added locPrf, path fields for CLI/JSON consistency. localPref, aspath will
be deprecated in future.
4. "show bgp ipv4/ipv6 summary json"
- Added "pfxRcd" for CLI/JSON consistency.
"prefixReceivedCount" will be deprecated in future.
- Added "pfxSnt" for peers. This count is obtalned from corresponding
update_subgroup. This needed a fix in the code where we copy fields
for a split update_subgroup from the parent update_subgrp.
New subgrp should inherit subgrp->scount(Count of advertized prefixes)
of the parent subgrp.
5. "show bgp neighbor json"
- Added "sentPrefixCounter"
6. "show bgp ipv4/ipv6 <prefix> json"
- Added "metric" field for CLI/JSON consistency.
"med" will be deprecated in future.
Signed-off-by: Ameya Dharkar <adharkar@vmware.org>
The existing commands "ip as-path", "ip community list", "ip extcommunity
list" & "ip largecommunity list" is used to configure both for ipv4 and
ipv6. So the prefix "ip" is removed from these commands.
All the configuration, show related configuration, show running config
& boot up with write memory is also verified with the provided fix.
Signed-off-by: Sarita Patra <saritap@vmware.com>
When this description code was added, it was all dead code since none of
the bools that checked if the communities were present were ever changed
from 0.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Testing Done: bgp-smoke completed with no new failures
While testing 5549 support using global addresses, discovered that
ipv6 nexthop tracking thru a route-reflector didn't work. Since
the next-hop used for remote nexthops resolves to the link-local
of the route-reflector, we need to track it in order to react to
interface down events. Also tripped over a crash in certain cases
which is also resolved in this fix.
During peer startup there exists the possibility that both
locally and remote peers try to start communication at the
same time. In addition it is possible for local configuration
to change at the same time this is going on. When this happens
try to notice that the remote peer may be in opensent or openconfirm
and if so we need to restart the connection from both sides.
Additionally try to write a bit of extra code in peer_xfer_conn
to notice when this happens and to emit a error message to
the end user about this happening so that it can be cleaned up.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that frr-relaod.py was not installing an aggregate
properly. Problem was actually that frr-reload.py does the command
twice, and the second time the aggregate command was entered, it would
appear in the config but the aggregate was removed from the bgp table
and not advertised to peers. Solved by noticing when an aggregate
was marked for deletion (info_invalid) and allowing the re-entry if
the old one was being removed.
Ticket: CM-22509
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Problem encountered where using the aggregate-address command in an
evpn environment did not work properly. Depending on the order of
actions, the aggregate may not be created or removed when either the
commands were issued or routes come and go.
Ticket: CM-20585
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Please note this is a Proof of Concept and not actually something
that is ready to commit at this point. The file tools/lua.scr
contains some documentation on how we expect it to work currently.
Additionally not all bgp values have been hooked up into the
ability to lua script yet.
There is still significant work to be done here:
1) Add the ability to pass in more data and to adjust the return values
as appropriate.
To set it up:
1) copy tools/lua.scr into /etc/frr (or whereever the config
directory is )
2) Create a route-map match command:
!
router bgp 55
neighbor 10.50.11.116 remote-as external
!
address-family ipv4 unicast
neighbor 10.50.11.116 route-map TEST in
exit-address-family
!
route-map TEST permit 10
match command mooey
!
3) In the lua.scr file make sure that you have a function
named 'mooey' ( as the above example does ):
function mooey ()
zlog_debug(string.format("Family: %d: %s %d ifindex: %d aspath: %s localpref: %d",
prefix.family, prefix.route,
nexthop.metric, nexthop.ifindex, nexthop.aspath, nexthop.localpref))
nexthop.metric = 33
nexthop.localpref = 13
return 3
end
This example script modifies the metric and localpref currently. I've also provided
a zlog_debug function in lua to allow some simple debugging.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In bgp if we have not configured bgp we were ignoring
interface based callbacks. Leading to states where
we may not be processing interface information.
Leading to states where we do not actually keep
ifp data. As an example:
Suppose vrf A and vrf B. A has interface swp1.
At the same time we only have a `router bgp 9 vrf B`
When we received the callback for moving swp1
from vrf A to vrf B we were not processing the
move at all and BGP would not consider the interface
part of vrf B at all.
This commit makes bgp pay attention to interface
events irrelevant if bgp is using that vrf. This
is now consistent with how the lib/if* expects
to work and the rest of the daemons in FRR.
Signed-off-by: Donald Sharp <sharpd@cumulsnetworks.com>
Wrapper the get/set of the table->info pointer so that
people are not directly accessing this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_nexthop_cache data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_connected_ref data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_static data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_distance data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The aggregate data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Track the refcount a bit differently as that it is possible
to get into situations where we have multiple calls for the
same ifc. So let's just keep a list of the ifc's off of
each `struct bgp_addr` and then keep the hash entry based
upon list count or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct bgp_addr` is not needed for anything other than
the address hash. Isolate this data structure so that it
is not polluting up the name space.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There were checks for null pointer after being dereferenced. Checks have
been removed (we've discussed the no need of adding assert()'s because
of similar code not requiring them).
Signed-off-by: F. Aragon <paco@voltanet.io>
All I can see is an unneccessary complication. If there's some purpose
here it needs to be documented...
Signed-off-by: David Lamparter <equinox@diac24.net>
Currently we only support RFC 5549 in bgp via
using the `neighbor swp1 interface remote-as ...`
command. This causes the extended capability
data to be traded as part of the open message.
Additionally at that point in time we notify
zebra to turn on the RA code for that interface
so that the zebra trick of turning the v6 nexthop
into a 169.254.0.1 nexthop and adding a neighbor
entry works.
This code change does 2 things:
1) Modify bgp to pass the extended capability
if we are attempting to establish a v4/unicast
session over a v6 peer. In the past we limited
this to just the LL based peer.
2) Modify the nexthop tracking code to notice
when it receives nexthop data about the global v6
peer to turn on RA code on those interfaces we will
be using. This will allow the v4 route with a v6
nexthop received in zebra to auto translate this
correctly.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Corrections so that the BGP daemon can work with the label manager properly
through a label-manager proxy. Details:
- Correction so the BGP daemon behind a proxy label manager gets the range
correctly (-I added to the BGP daemon, to set the daemon instance id)
- For the BGP case, added an asynchronous label manager connect command so
the labels get recycled in case of a BGP daemon reconnection. With this,
BGPd and LDPd would behave similarly.
Signed-off-by: F. Aragon <paco@voltanet.io>
For tracking the last state of the penalty (struct bgp_damp_info) a 'double'
type was used instead of using the 'unsigned int' being used in the structure.
Detected using ./configure CFLAGS=-Wfloat-equal CC=clang
Signed-off-by: F. Aragon <paco@voltanet.io>
When using `no bgp fast-external-failover` and a interface moves
from one vrf into another we would not fully process the change.
Fix this code path.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The peer->nexthop.ifp pointer must be set when parsing the
attributes in bgp_mp_reach_parse, notice this
and fail gracefully.
Rework bgp_nexthop_set to remove the HAVE_CUMULUS and to
fail the nexthop_set when we have a zebra connection and
no ifp pointer, as that not havinga zebra connection and
no ifp pointer is legal.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the origin changed we must honor and update the aggregate
to the peer. This code adds a bit of code to the bgp_aggregate_info_same
code to see if the origin has changed and to indicate that it has.
Fixes: #2993
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Whether or not RPKI is enabled during build shouldn't really influence
vtysh; the user can always manually install bgpd_rpki.so later and it
should work. This also means that the behaviour of "RPKI module not
loaded" is consistent regardless of whether it was a compile-time or
runtime decision.
Signed-off-by: David Lamparter <equinox@diac24.net>
config.h (or, transitively, zebra.h) must be the first include file
listed for autoconf things like _GNU_SOURCE and _POSIX_C_SOURCE to work
correctly.
Signed-off-by: David Lamparter <equinox@diac24.net>
Since we're now building through one large Makefile, we can easily put
things with their daemons and crossreference nicely.
Signed-off-by: David Lamparter <equinox@diac24.net>
Note: no more --with-rfp-path on configure - badly messing with the
build system like this really isn't how to do a conditional external
dependency.
Signed-off-by: David Lamparter <equinox@diac24.net>
Switch bgp and ripngd to use the new aggregate table and
route data structures. This was mainly a search and replace
operation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that some bgp and ospf json commands did not return
any json output at all if the bgp/ospf instance did not exist.
Additionally, some bgp and ospf json commands did not return any json
output if the instance existed but no neighbors were defined. This
fix makes these commands more consistent in returning empty braces for
json output and issue a message if not using json output. Additionally,
made the flag "use_json" a bool to make it consistent since previously,
it had been defined as an int, char, u_char, and bool at various places.
Ticket: CM-21040
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Because a VRF name can be used for default VRF, or an alias of an
already created VRF can be passed as parameter, the default VRF name
must be found out. This avoids creating double BGP instances for
example.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The Vrf aliases can be known with a specific hook. That hook will then,
from zebra propagate the information to the relevant zapi clients.
The registration hook function is the same for all daemons.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Coded as part of #2684 and most code written while participating at
BornHack@2018.
bgp_route.c: Changes regarding adding explanations for the IANA
well-known communities added in #2684
Signed-off-by: Christoffer <netravnen@gmail.com>
While perusing CONFDATE I noticed that we had a couple
CONFDATE 201805, which we were not picking up( for other
reasons and fixed in a different PR ). But upon investigation
of these I noticed that the commits where in 201805, so these
CONFDATES should be in 2019
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If a command returns a nonzero exit status and VTYSH has a corresponding
command, VTYSH will skip executing its own version. If this happens in a
command that changes CLI nodes we get node desynchronization.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Several zlog_warns were being used to tell the end
user that bgp had detected a bug. These all look like information
added during development that can be noted as debugs or logged
as an error situation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
A warn with a backtrace does not need another warn
to file an issue with Quagga, so just remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Scan all bgp vrf instances and respective L3VNI against the VNI which is being configured.
Ticket:CM-21859
Testing Done:
Configure l3vni,
try to configure same vni as l2vni under router bgp, address-family
l2vpn evpn.
The configuration is rejected.
show evpn vni
VNI Type VxLAN IF # MACs # ARPs # Remote VTEPs Tenant VRF
4001 L3 vx-4001 0 0 n/a vrf1
TOR(config)# router bgp 5546
TOR(config-router)# address-family l2vpn evpn
TOR(config-router-af)# vni 4001
% Failed to create VNI
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Root Cause: In the function bgp_show_table(), we are creating a
json object and a json array with the same name as “json_paths”.
First it will create a json object variable "json_paths" pointing
to the memory allocated for the json object. Then it will create
a json array for each bap node rn (if rn->info is available) with
the same name as json_paths. Because of this, json_paths which was
pointing to the memory allocated for the json object earlier, now
will be overwritten with the memory allocated for the json array.
As per the existing code, at the end of each iteration loop of bgp
node, it will deallocate the memory used by the json array and
assigned NULL to the variable json_paths. Since we don’t have the
pointer pointing to the memory allocated for json object, will be
not able to de-allocate the memory, which is a memory leak here.
Fix: Removing this json object since it is never getting used in
this function.
Testing: Reproduced the memory leak with valgrind.
With the fix, memory leak gets resolved and checked with valgrind.
Signed-off-by: Sarita Patra saritap@vmware.com
In some situations rtrlib does not release the locks for its internal
data structures before calling a callback. This can lead to deadlocks
when a lot of routes must be revalidated because the sync socket buffer
will fill up and block the rtrlib thread. The bgpd main thread then
waits for rtrlibs internal locks to be released indefinitely.
This is fixed by using nonblocking sockets instead of blocking ones and
setting a flag to revalidate everything, if it would block.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
* Added parameter in bgp_redistribute_set() to indicate change
in redistribute option
* If there is change, call bgp_redistribute_unreg() to withdraw routes
Signed-off-by: kssoman <somanks@vmware.com>
Make the wart slightly less bad... also there is still a possible write
after free here. This needs to be fixed again, properly, by some
structure changes.
Signed-off-by: David Lamparter <equinox@diac24.net>
e.g.
pimd/pim_oil.c: In function ‘pim_channel_oil_dump’:
pimd/pim_oil.c:51:19: error: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Werror=format-overflow=]
Build on gcc-8.2.0 is warning-free after this patch.
Signed-off-by: David Lamparter <equinox@diac24.net>
Avoid memory leak in bgp flowspec list.
Usage of bool parameter instead of int, to handle the number of entries
PBR.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Implement procedures similar to what is specified in
https://tools.ietf.org/html/draft-malhotra-bess-evpn-irb-extended-mobility
in order to support extended mobility scenarios in EVPN. These are scenarios
where a host/VM move results in a different (MAC,IP) binding from earlier.
For example, a host with an address assignment (IP1, MAC1) moves behind a
different PE (VTEP) and has an address assignment of (IP1, MAC2) or a host
with an address assignment (IP5, MAC5) has a different assignment of (IP6,
MAC5) after the move. Note that while these are described as "move" scenarios,
they also cover the situation when a VM is shut down and a new VM is spun up
at a different location that reuses the IP address or MAC address of the
earlier instance, but not both. Yet another scenario is a MAC change for an
attached host/VM i.e., when the MAC of an attached host changes from MAC1 to
MAC2. This is necessary because there may already be a non-zero sequence
number associated with MAC2. Also, even though (IP, MAC1) is withdrawn before
(IP, MAC2) is advertised, they may propagate through the network differently.
The procedures continue to rely on the MAC mobility extended community
specified in RFC 7432 and already supported by the implementation, but
augment it with a inheritance mechanism that understands the relationship
of the host MACIP (ARP/neighbor table entry) to the underlying MAC (MAC
forwarding database entry). In FRR, this relationship is understood by the
zebra component which doubles as the "host mobility manager", so the MAC
mobility sequence numbers are determined through interaction between bgpd
and zebra.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.
Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000 L2 169.253.0.11:9 6646:1000 6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp" /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp" /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#
Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).
Ticket: CM-21566
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In bgp_keepalives.c, it was noticed that we were
ensuring that we called an intialization function first,
but this is a development escape in that once this
was fixed we never see it. So if a developer moves
this assumption around, let's crash the program and
lead them to this spot instead of silently ignoring
the problem.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There exists a few places where actual debugs were being
displayed as warns. Convert them over to debugs and
guard as appropriate.
Signed-off-by: Donald Sharp <sharpd@cumulsunetworks.com>
backet->data must be non-NULL( look at hash_get ) as such
we do not need to check for NULL values for this when
we retrieve data from the backet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Use the correct license header
* Stop headers from including themselves
* Use uniform relative include conventions
* Ensure that sources include what they use
* Turn off clang-format around struct array blocks
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Classful networking has been obsolete for ages and there is currently an
inconsistency between `show ip route` and `show bgp`, where the first
one always displays the CIDR mask while the second one hides classful
network masks.
This commit adjusts the behavior of `show bgp` to always show the CIDR
mask for a route, even when it is classful.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
Problem reported that when systemctl restart networking or switchd
performed, not all imported prefixes were successfully restored.
Ticket: CM-21684
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Ensure that the presence of L3VNI is checked before we generate
Router MAC and L3 Route Target extended communities. Without this
check, the router would send an all-zeros RMAC in some situations,
which may cause problems for receivers.
Ticket: CM-21014
Testing Done:
a) Verification of failed scenario
b) Interop verification by Scott Laffer
c) evpn-smoke
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
There is no need to check for failure of a ALLOC call
as that any failure to do so will result in a assert
happening. So we can safely remove all of this code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don't show BFD commands with timers since it might confuse users
("show running-config" won't display timers in client daemons anymore),
but keep accepting this command from previous configurations.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When BFD timers are configured, don't show it anymore in the daemon
side. This will help us migrate the timers command from daemons to
`bfdd`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When calling route_map_finish, every place that we do we must
first set the deletion event to NULL, or we will create an infinite
loop, if we are using the delayed route-map application code.
As such we might as well just make the route_map_finish code
do this work, as that there is really no viable alternative here
and route_map_finish should only be called on shutdown.
This fixes an infinite loop in zebra on shutdown when there
are route-maps.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When a bgp instance is stopped, with a `no router bgp..`
make sure any timers associated with the instance are stopped
as well.
This issue was discovered when a customer issued a `no router bgp`
while a maxmed timer was operative. The max-med timer used the
`struct bgp *` as the passed in value for the thread. The
thread eventually popped after the cleanup and attempted to use
data off in lala land and crashed
Ticket: CM-21895
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The dn_count for dynamic Peers was not being updated when
using json output.
Signed-off-by: Dongling Duan <dlduan@amazon.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Updated the list with listed well-known communities from document IANA's https://www.iana.org/assignments/bgp-well-known-communities/bgp-well-known-communities.txt with last update date as of 2018-03-07.
- GRACEFUL_SHUTDOWN moved to 2nd entry in all lists in touched code.
- Added ACCEPT_OWN - [RFC7611]
- Added ROUTE_FILTER_TRANSLATED_v4 - currently [draft-l3vpn-legacy-rtc]
- Added ROUTE_FILTER_v4 - currently [draft-l3vpn-legacy-rtc]
- Added ROUTE_FILTER_TRANSLATED_v6 - currently [draft-l3vpn-legacy-rtc]
- Added ROUTE_FILTER_v6 - currently [draft-l3vpn-legacy-rtc]
- Added LLGR_STALE - currently [draft-uttaro-idr-bgp-persistence]
- Added NO_LLGR - currently [draft-uttaro-idr-bgp-persistence]
- Added accept-own-nexthop - currently [draft-agrewal-idr-accept-own-nexthop]
- Added BLACKHOLE - [RFC7999]
- Added NOPEER - [RFC3765]
Avoid displaying default configured local preference as
part of bgp route's detail output.
Local preference is for iBGP learnt route's. The value could be
default (100) or configured value (via routemap or local pref config cmd).
show bgp afi safi (brief output) does not display,
if the local pref attribute is not set.
Similarly, show bgp afi safi detail route output should display
if the the attribute is set, and should not display configured value.
This way both output would be consistent.
The configured local preference can be seen via running-config.
Ticket:CM-12769
Reviewed By:
Testing Done:
eBGP output:
show bgp ipv4 45.0.3.0/24
BGP routing table entry for 45.0.3.0/24
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
MSP1(uplink-1) MSP2(uplink-2)
Local
0.0.0.0 from 0.0.0.0 (27.0.0.9)
Origin incomplete, metric 0, weight 32768, valid,sourced, bestpath-from-AS Local, best
AddPath ID: RX 0, TX 7
Last update: Thu Jul 26 02:10:02 2018
iBGP output:
show bgp ipv4 unicast 6.0.0.16/32
BGP routing table entry for 6.0.0.16/32
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
6.0.0.16 (metric 20) from tor-12(6.0.0.16) (6.0.0.16)
Origin incomplete, metric 0, localpref 100, valid, internal, bestpath-from-AS Local, best
AddPath ID: RX 0, TX 13
Last update: Thu Jul 26 05:26:18 2018
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The bi->net pointer that is being unlocked had a commit
that removed the `bi->net = NULL;` recently. This code
was preventing a use after free crash being experienced
in other code paths. While commit 37e679629f was fixing
a different code path crash.
Make the parent->net pointer aware it may be locked/freed
from multiple places and to not NULL the pointer to it
unless we have actually freed the data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is a default BGP VPN and BGP VRF instance in L3VPN configuration.
The routes are imported and exported between BGP VPN and BGP VRF.
Suppose there is one route in BGP VRF and exported to BGP VPN.
In BGP VPN there is bgp_info struct with bgp_info_extra struct has parent pointer pointing to the bgp_info of BGP VRF.
We take the lock for bgp_node and bgp_info of BGP VRF in the context of BGP VPN.
bgp_info has a back pointer to bgp_node via net.
Now when we have done "no rd vpn" in BGP VRF then in bgp_info_extra_free we have to free the parent resources.
In this context only unlocking is required. It should not set the BGP VRF (bgp_info->net) to NULL.
Signed-off-by: vishaldhingra vdhingra@vmware.com
Adding EVPN prefix of Type 2, 3 and 5 routes to bgp updates
prefix filters.
Ticket:CM-14476
Testing Done:
Configure multiple evpn options under 'debug bgp updates prefix'.
Below is the running-config output.
MAC-IP route with just MAC:
debug bgp updates prefix l2vpn evpn type macip mac
00:02:00:0a:0a:0a
MAC-IP route with MAC and IP:
debug bgp updates prefix l2vpn evpn type macip mac
00:02:00:00:00:0c ip 45.0.1.9
MAC-IP route with just MAC and IPv6:
debug bgp updates prefix l2vpn evpn type
macip mac 00:02:00:00:00:0a ip 2001:fee1:0:1::8
Type-3:
debug bgp updates prefix l2vpn evpn type multicast ip 27.0.1.19
Type-5:
debug bgp updates prefix l2vpn evpn type prefix
ip 2060:1:1:1::/64
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
When configuring an interface, the no local-install any command appears,
and leads to confusions. because the effect of that command differs if
it is executed after local-install <interfaces> or before executing
local-install <interfaces>, the proposal fix here is to suppress that
command from the vty available commands.
PR=59595
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Acked-by: Alain Ritoux <alain.ritoux@6wind.com>
because the IP destination criterium may match several entries, the show
command may return more than one entry.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When adding an entry, a check is done in order to flush previously
configured entries. The whole parameters are checked so as to not remove
some entries that have ipset entries equal, but not iptable settings.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Instead of relying on jhash_1word for some parameters that are not 32
bit size, the jash(pointer, len) function is used.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Because one flowspec entry can create 1-N bgp pbr entries, the list is
now updated and visible. Also, because the bgp_extra structure is used,
this list is flushed when the bgp_extra structure is deleted.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
EVPN ND ext community support NA flag R-bit, to have proxy ND.
Set R-bit in EVPN NA if a given router is default gateway or there is a
local
router attached, which can be determine based on local neighbor entry.
Implement BGP ext community attribute to generate and parse R-bit and
pass along zebra to program neigh entry in kernel.
Upon receiving MAC/IP update with community type 0x06 and sub_type 0x08,
pass the R-bit to zebra to program neigh entry.
Set NTF_ROUTER in neigh entry and inform kernel to do proxy NA for EVPN.
Ref:
https://tools.ietf.org/html/draft-ietf-bess-evpn-na-flags-01
Ticket:CM-21712, CM-21711
Reviewed By:
Testing Done:
Configure Local vni enabled L3 Gateway, which would act as router,
checked
show evpn arp-cache vni x ip <ip of svi> on originated and remote VTEPs.
"Router" flag is set.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit removes various parts of the bgpd implementation code which
are unused/useless, e.g. unused functions, unused variable
initializations, unused structs, ...
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current behavior of the `bgp default shutdown` command is to set the
state of all newly configured peers to shutdown. This leads to a problem
when restarting bgpd, because all peers will then be seen as newly
configured, which leads to all peers being set to shutdown after each
restart.
This behavior is undesired and not common when comparing the
implementation against other vendors. This commit moves the `bgp default
shutdown` configuration underneath the peer-group and peer
configuration, to ensure that existing peers will not be set to shutdown
after a daemon restart.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
In Symmetric routing case, L3VNI stores evpn MAC_IP routes
as IP_PREFIX routes in associated bgp_vrf and fib.
When vxlan device for l3vni goes down, triggers l3vni delete
in bgp.
As part l3vni delete, evpn ip prefix routes associated
with the vni need to be withdrawn from zebra as well
bgpinfo needs to be freed.
bgp_delete does not free bgp_info associated
to evpn ip prefix routes (link to bgp_vrf).
Call to uninstall_evpn_route_entry_in_vrf() properly
cleanup bgp_info as well triggers appropriate updates.
Ticket:CM-21443
Testing Done:
On DUT, bringup symmetric routing configuration, learn
EVPN Type-2 and Type-3 Routes.
Type-2 MAC_IP routes will be stored as ip_prefix in vrf table
during l3vni bring up.
Remove L3vni, deletes all ip_prefix routes from the zebra, kernel
vrf route table and bgp_info is freed.
Check the show bgp memory stats for bgp_info post l3vni flap.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The warning given by PVS-Studio is related to per-element overflow (there is
no real overflow, because of how elements are mapped in the union). This
same warning is typically reported by Coverity, too.
Signed-off-by: F. Aragon <paco@voltanet.io>
If a cache server was added after rpki was started it's tr_socket would
not be initialized. This would lead to a segfault if the rtr manager
ever decides to switch to that socket or if rpki support is stopped.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Issue 2473
VRF-VPN route-leaking configuration commands assumed existence of
default bgp instance. If a non-default vrf configuration occurred
before the default vrf configuration, it would result in a (NULL)
dereference to the non-existent default vrf.
This change 1) detects non-existence of default vrf and skips corresponding
RIB updating that normally occurs during configuration changes of the
route-leaking parameters; and 2) when the default bgp instance is defined,
runs the deferred RIB updating.
In vpn_leak_to_vrf_update_all(), replace early return if bgp_vpn is NULL
with assert, as the former would lead to subtly wrong RIB contents. The
underlying issue that led to commit 73aed5841a
should be fixed by the above changes.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
This correction fixes three bugs detected by Clang scan:
Bug Group: Logic error
Bug Type: Dereference of null pointer
File: bgpd/bgp_evpn.c
Function: bgp_evpn_unconfigure_import_rt_for_vrf
Line: 4246
File: isisd/isis_spf.c
Function: isis_print_paths
Line: 69 (two bugs of same type in one line)
Signed-off-by: F. Aragon <paco@voltanet.io>
In order to make EVPN behavior work without special casing
the code, bring the evpn commands under HAVE_CUMULUS into
the fold.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The return value can be ignored because in case of error both the 'afi'
and 'safi' variables are set to the invalid values (AFI_MAX, SAFI_MAX),
and the invalid values are handled properly afterwards in the 'default'
blocks.
Signed-off-by: F. Aragon <paco@voltanet.io>
Some values for icmp type/code can not be encoded like port source or
port destination. This is the case of 0 value that is authorized for
icmp.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
As the other enumerate list, icmp type and code are handled as the other
combinations. The icmp type and code options are the last options to be
injected into PBR. If icmp type is present only, all the filtering will
apply to this icmp type. if icmp code is present only, then all the
combination will be done with icmp type ranging from 0 to 255 values.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The recursive algorithm was taking into account the fact that all the
bpof structures were filled in. Because the dscp value was not given,
the pkt_len parsing could not be achieved. Now the iteration takes into
account each type according to the previous one, thus guaranting all
parameters to be parsed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The flowspec fragment attribute is taken into account to be pushed in
BGP policy routing entries. Valid values are enumerate list of 1, 2, 4,
or 8 values. no combined value is supported yet.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
As fragment bitmask and tcpflags bitmask in flowspec protocol is encoded
in the same way, it is not necessary to differentiate those two fields.
Moreover, it overrides the initial fragment limit set to 1. It is now
possible to handle multiple framgent values.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The packet length can be injected from fs entry with an enumerate list;
the negation of the value is also taken into account.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
So as to add or remove entries with flowspec or operations like tcp
flags or dscp enum list, a mechanism is put in place that adds
recursivity.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If one dscp value or an enumerate list of or values of dscp are
provided, then the bgp pbr entries created will take into account the
dscp values.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Before adding/removing to zebra, flowspec entries parses the list of
combinations or avaialble and creates contexts in order to be injected
to zebra.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
tcp flags combinations ( or enumerates) are hosted in a structure that
will be analysed later, when wanting to inject that information to
zebra.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The flowspec enumerate list can either be and values or or values.
In the latter case, a list is created that will be used later.
Also, the API supports the check for both and or or operations. This API
does not permit to handle both and and or operations at the same time.
The list will have to be either and or or. An other API retrieves the
operator unary value that is used: and or or. or 0 is the two operators
are used at the same time.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Because the Flowspec entries are parsed first, then injected to Zebra,
there are cases where the install feedback from zebra is not received.
This leads to unnecessary add route events, whereas one should be
enough.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Those flags can be shared between BGP and Zebra. That is why
those flags are moved to common pbr.h header file.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
To handle FS params between FS RIB and BGP PBR entities, a structure
intermediate named bgp_pbr_filter is used, and contains all filtering
information that was before passed as a parameter.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
To know which entry is set/unset, a debug handler is present, that
displays which entry is injected/removed to/from zebra.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
It is possible for flowspec entries containing ICMP rule to insert PBR
entries based on ICMP type and ICMP code.
Flowspec ICMP filtering can either have icmp type or icmp code or both.
Not all combinations are permitted:
- if icmp code is provided, then it is not possible to derive the
correct icmp value. This will not be installed
- range of ICMP is authorised or list of ICMP, but not both.
- on receiving a list of ICMPtype/code, each ICMP type is attempted to
be associated to ICMP code. If not found, then ICMPtype is combined
with all known ICMP code values associated to that ICMP type.
- if a specific ICMP type/code is needed, despite the ICMP code/type
combination does not exist, then it is possible to do it by forging a
FS ICMP type/code specific for that.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Issue 2381: interface based routes not marked "up" when they originate
in zebra, redistributed to bgp vrf, then imported to vpn and then
imported by another vrf.
Routes that are redistributed into BGP from zebra should not get
nexthop tracking (the assumption is that the originating protocol
is responsible to export or withdraw the route according to its own
notion of nexthop status).
The vpn-vrf route-leaking code checks the source route sub_type to
decide whether to use nexthop tracking on the resulting leaked route.
A route that is redistributed from zebra into bgp will have
sub_type==BGP_ROUTE_REDISTRIBUTE. If it is leaked to the vpn RIB,
the resulting vpn RIB route will have sub_type==BGP_ROUTE_IMPORTED.
If THAT vpn route is leaked to another vrf, the original code will
examine only the leak-source route sub_type and, since it is
not BGP_ROUTE_REDISTRIBUTE, will wrongly try to use nexthop tracking
on the new route in the final vrf.
This change modifies the leak function to track back up the
parent links to the ultimate parent of the leak source route
and look at that route's sub_type instead.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
The route_map_walk_update_list callback function
never uses the return code, so just remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
route_map_mark_updated has a `int del_later` variable
that is passed in but never used. Just remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When bgp vrf is configured with non-default
RD value, config flag is set.
Upon removing non-default RD value the flag was not reset,
thus displayed default RD value in running-config.
router bgp 5550 vrf vrf1
rd 45.0.2.2:5
Unset the RD configuration flag under bgp_vrf instance.
Ticket:CM-20206
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit fixes the issue mentioned in #2419, which is caused by a
double-free. The problem of the current implementation is that
*bgp_input_modifier* already frees the passed attributes under specific
circumstances, which can then lead to a double-free as *bgp_attr_undup*
does not check if the attributes are set to NULL.
As it is not transparent to the function caller if the attributes get
freed or not and the similar function *bgp_output_modifier* also does
not flush the passed attributes, the line has been removed altogether.
All callers of *bgp_input_modifier* already deal by themself with
freeing/flushing/unduping BGP attributes, so it is safe to remove.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit finalizes the previous commits which introduced a generic
approach for making all BGP peer and address-family attributes
overrideable by keeping track of the configuration origin in separate
internal structures.
First of all, the test suite was greatly extended to also check the
internal data structures of peer/AF attributes, so that inheritance for
internal values like 'peer->weight' is also being checked in all cases.
This revealed some smaller issues in the implementation, which were also
fixed in this commit. The test suite now fully passes and covers all the
usual situations that should normally occur.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit introduces BGP peer-group overrides for the last set of
peer-level attrs which did not offer that feature yet. The following
attributes have been implemented: description, local-as, password and
update-source.
Each attribute, with the exception of description because it does not
offer any inheritance between peer-groups and peers, is now also setting
a peer-flag instead of just modifying the internal data structures. This
made it possible to also re-use the same implementation for attribute
overrides as already done for peer flags, AF flags and AF attrs.
The `no neighbor <neigh> description` command has been slightly changed
to support negation for no parameters, one parameter or * parameters
(LINE...). This was needed for the test suite to pass and is a small
change without any bigger impact on the CLI.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit implements BGP peer-group overrides for the timer flags,
which control the value of the hold, keepalive, advertisement-interval
and connect connect timers. It was kept separated on purpose as the
whole timer implementation is quite complex and merging this commit
together with with the other flag implementations did not seem right.
Basically three new peer flags were introduced, namely
*PEER_FLAG_ROUTEADV*, *PEER_FLAG_TIMER* and *PEER_FLAG_TIMER_CONNECT*.
The overrides work exactly the same way as they did before, but
introducing these flags made a few conditionals simpler as they no
longer had to compare internal data structures against eachother.
Last but not least, the test suite has been adjusted accordingly to test
the newly implemented flag overrides.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit cleans up some ugly leftovers from previous flag-override
implementation and refactors the AF-flag override implementation to
match the same behavior the newly added peer-flag override
implementation has.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current implementation of the overrides for peer address-family
attributes suffered a bug, which caused all peer-specific attributes to
be lost when the peer was added to a peer-group which already had that
specific address-family active.
This commit extends the *peer_group2peer_config_copy_af* function to
respect overridden flags properly. Additionally, the arguments of the
macros *PEER_ATTR_INHERIT* and *PEER_STR_ATTR_INHERIT* have been
reordered to be more consistent and easy to read.
This commit also adds further test cases to the BGP peer attributes test
suite, so that this kind of error is being caught in future commits. The
missing AF-attribute *distribute-list* has also been added to the test
suite.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current implementation of peer flags (e.g. shutdown, passive, ...)
only has partial support for overriding flags of a peer-group when the
peer is a member. Often settings might get lost if the user toys around
with the peer-group configuration, which can lead to disaster.
This commit introduces the same override implementation which was
previously integrated to support proper peer flag/attribute override on
the address-family level. The code is very similar and the global
attributes now use their separate state-arrays *flags_invert* and
*flags_override*.
The test suite for BGP peer attributes was extended to also check peer
global attributes, so that the newly introduced changes are covered. An
additional feature was added which allows to test an attribute with an
*interface-peer*, which can be configured by running `neighbor IF-TEST
interface`. This was introduced so that the dynamic runtime inversion of
the `extended-nexthop` flag, which is only enabled by default for
interface peers, can also be tested.
Last but not least, two small changes have been made to the current bgpd
implementation:
- The command `strict-capability-match` can now also be set on a
peer-group, it seems like this command slipped through while
implementing peer-groups in the very past.
- The macro `COND_FLAG` was introduced inside lib/zebra.h, which now
allows to either set or unset a flag based on a condition. The syntax
for using this macro is: `COND_FLAG(flag_variable, flag, condition)`
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
When evpn configured wiht route-map with vni which is not
configured. Upon receiving evpn routes (i.e Type-2, Type-3),
route-map match will be triggered. Since there is no l2vni
exists in db, some of the member fields in bgp_info (i.e.
dummy_info_extra) are passed uninitialized to evpn filter match cb.
This results in inaccessible memory causes crash.
Fix is to memset the bgp_info prior to passing to evpn filter cb.
In evpn vni filter cb, ensure to have NULL check for member filed
of the bgp_info.
memset bgp_info at few places where it is passed to route_match.
Ticket:CM-21335
Reviewed By:
Testing Done:
Configure route-map with not configured l2vni
Simulate to learn l2vpn type-2, 3 route
Restart frr.service with below config
address-family l2vpn evpn
neighbor fear route-map EVPN_VNI out
route-map EVPN_VNI deny 10
match evpn vni 140010
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Crash w/ an assert if someone calls bgp_delete with a
NULL parameter as opposed to crashing when we dereference
the pointer a bit later.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The process of BGP shutdown hard free's memory irrelevant to
the fact that another process may be using that memory still
in route leaking scenario's.
As a temporary fix find the default instance and free it
last.
Ticket: CM-21068
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_info_extra_free code was the correct place to free
up data associated with the bgp_info pointer when we are
deleting the bgp_info node.
Additionally, if we have a parent pointer, we may not have a net
pointer. So make sure we do.
Finally clean up the bgp_info_extra_free code so it is a bit
easier to read. Use variables instead of multiple level
of casting.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
L2VNI route-distinguisher and route-target configuartions
should only applied under DEFAULT VRF bgpd instance.
Add newline to each vni display.
router bgp 65006
address-family l2vpn evpn
vni 1000101
route-target import 1:1000101
do not allow under
router bgp 65006 vrf RED
Ticket:CM-20204
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* Move configure flag propagations out of user flags
* Use AC_SUBST to transfer flag values to Automake
* Set default AM_CFLAGS and AM_CPPFLAGS in common.am and change child
Makefiles to modify these base variables
* Add flag override to turn off all sanitizers when building clippy
* Remove LSAN suppressions blacklist as it's no longer needed
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
With a new version of clang 6.0, the compiler is detecting more
issues where we may be possibly be truncating the output string.
Fix by increasing the size of the output string to make the compiler
happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The labeled unicast and unicast tables have been combined
into the unicast table. Additionally we have a restriction
where if you configure labeled unicast you cannot configure
unicast. This created a bug with 'show bgp ipv4 labeled-unicast summ'
command where we were displaying NoNeg, because v4 has been intentionally
turned off.
Modify the code so that when we are looking up if we have negotiated
a capapbility we use the correct one, while still using the appropriate
table for prefix count.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
These two functions are functionally the same, except
bgp_aggregate_route is meant to handle the addition and
deletion of routes, while aggregate_add is meant for all of them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The aggregated route was being sent in updates to peers every
time a route changed that we were aggregating. Modify
the code such that we only send aggregated route updates
if we actually have something different to tell the peer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The function bgp_aggregate_delete function was forward
declared and not static. Move it so we can clean that
up.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is a transitional commit, to get us where we want to go.
Seperate out the install/removal of the aggregate route from
the bgp_aggregate_delete and bgp_aggregate_route functions.
In the future we'll write a bit of code to determine if the
aggregate add has actually changed any information we care
about.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were allowing useless aggregation commands (/32 and /128).
These were being silently accepted and nvgenned and then
just ignored.
When a user enters a value that should be rejected tell
them and reject.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Make bgp_aggregate_route easier to read. It was indented so many
levels that it was extremely hard to figure out what it was doing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The #define AGGREGATE_NEXTHOP_CHECK has not been used
for a very very long time. Since this is effectively
dead code, let's remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The safi passed in to short-circuit the aggregate lookup
adds code complexity and little speed improvements for
the case where we actually may have aggregates configured!
Since bgp_table_top_nolock() actually tells us if there
are any aggregates installed and safely returns if there
is nothing to do, trust it. As that we know for those
safi's were we don't want to have, we dissallow the
creation via the cli anyways.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_aggregate_set/unset functions are only called from the cli
invocations which control what AFI/SAFI we are looking at. Tests
for safi are unimportant.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp data structures:
bgp->vnihash
bgp->vrf_export_rtl
bgp->vrf_import_rtl
bgp->l2vnis
Must always be valid data structures. So remove the tests
that ensure that they are.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are determining the state of a peer, we sometimes
detect that we should update the peer->su. The bgp->peer_hash
keeps a hash of peers based upon the peer->su. This requires
us to release the stored value before we re-insert it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Programs that link to libnetsnmp must be compiled using a special set
of flags as specified by the "net-snmp-config --base-cflags" command
(whose output is stored in the SNMP_CFLAGS variable). The problem is
that "net-snmp-config --base-cflags" can output -std=c99 in addition to
other compiler flags in some platforms, and this breaks the build since
FRR souce code makes use of some GNU compiler extensions (e.g. allow
trailing commas in function parameter lists). In order to solve this
problem, append -std=gnu99 after SNMP_CFLAGS in all makefiles where this
variable is used. This way the -std=c99 flag will be overwritten when it's
present. Source files that don't link to libnetsnmp will be compiled using
either -std=gnu99 or -std=gnu11 depending on the compiler availability.
Fixes#1617.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This command needs to be deprecated. It partially implements
a refusal to create multiple instances. If you do not need
multiple instances, just don't create them in the cli instead.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This command needs to be deprecated. It sets a small variety
of options via the BGP_OPT_CONFIG_CISCO flag. Set for removal
in 1 year.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleanup the leaked ecommunity data that we may have on shutdown.
Cleanup leaked vrf name strings on shutdown.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There exists cases where we will attempt to hard delete
the bgp instance( say a `no router bgp` is issued )
when we have vrf route leaking. If we do have this
lock the bgp instance of the originator and do not
let it be deleted out from under us until we are
finished processing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are shutting down, there exists a code path
where the connected table leaks some memory. Cleanup
the code to remove the memory.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The rearrangement of where the decision point of
filling in the aggregate information, must have allowed
SA to find a path of code where we may use ifindex uninitialized.
While I don't think this is possible to happen, make this issue
go away.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have 2 code paths that were duplicating a bunch of code
for the deletion of connected prefixes.
This simplifies the code path and makes the code look a bit
cleaner.
I did not touch the _add path because the v4 if statement
had some code I did not have time to look into. Future project.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The call to str2prefix_rd when we get to this point of the
code cannot fail. As such let's just ignore the return code.
Found by Coverity SA.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Issue Found by Coverity Scan. When we call peer_clear we
are checking the return code in every other call. Add
a bit of extra code to notice the failure and note it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The if statement had a second instance of the same variable
to test as part of the statement.
Found by SA.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The aggregate-address command is not creating the null0
route. This got lost somewhere in the last year or so.
Add this ability back for BGP route installs into
zebra.
We need this aggregate route installed into the rib
because we are drawing this traffic to us irrelevant
of the number of routes we do have for that prefix.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit fixes all outstanding style/formatting issues as detected by
'git clang-format' or 'checkpath' for the new peer-group override
implementation, which spanned across several commits.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The previous commit introduced very strict unit tests which check all
three involved components (config input, config output, internal data
structures) which revealed two more bugs in the peer-group override
implementation.
This commit fixes overrides for 'allowas-in <number>' and
'unsuppress-map', which both had a small mistake/typo causing those
issues.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit fixes peer-group overrides for inverted AF flags. This
implementation is currently only being used by the three 'send-community'
flags. Commit 70ee29b4d introduced generic support for overriding AF
flags, but did not support inverted flags.
By introducing an additional array on the BGP peer structure called
'af_flags_invert' all current and future flags which should work in an
inverted way can now also be properly overridden.
The CLI commands will work exactly the same way as before, just that 'no
<command>' now sets the flag and override whereas '<command>' will unset
the flag and remove the override.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit adds the same peer-group override capabilites as d122d7cf7
for all filter/map options that can be enabled/disabled on each
address-family of a BGP peer.
All currently existing filter/map options are being supported:
filter-list, distribute-list, prefix-list, route-map and unsuppress-map
To implement this behavior, a new peer attribute 'filter_override' has
been added together with various PEER_FT_ (filter type) constants for
tracking the state of each filter in the same way as it is being done
with 'af_flags_override'.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current implementation for overriding peer-group configuration on a
peer member consists of several bandaids, which introduce more issues
than they fix. A generic approach for implementing peer-group overrides
for address-family flags is clearly missing.
This commit implements a generic and sane approach to overriding
peer-group configuration on a peer-member. A separate peer attribute
called 'af_flags_override' which was introduced in 04e1c5b is being used
to keep track of all address-family flags, storing whether the
configuration is being inherited from the parent-group or overridden.
All address-family flags are being supported by this implementation
(note: flags, not filters/maps) except 'send-community', which currently
breaks due to having the three flags enabled by default, which is not
being properly handled within this commit; all flags are supposed to
have an 'off'/'false' state by default.
In the interest of readability and comprehensibility, the flag
'send-community' is being fixed in a separate commit.
The following rules apply when looking at the new peer-group override
implementation this commit provides:
- Each peer-group can enable every flag (except the limitations noted
above), which gets automatically inherited to all members.
- Each peer can enable each flag independently and/or modify their
value, if available. (e.g.: weight <value>)
- Each command executed on a neighbor/peer gets explicitely set as an
override, so even when the peer-group has the same kind of
configuration, both will show up in 'show running-configuration'.
- Executing 'no <command>' on a peer will remove the peer-specific
configuration and make the peer inherit the configuration from the
peer-group again.
- Executing 'no <command>' on a peer-group will only remove the flag
from the peer-group, however not from peers explicitely setting that
flag.
This guarantees a clean implementation which does not break, even when
constantly messing with the flags of a peer-group. The same behavior is
present in Cisco devices, so people familiar with those should feel safe
when dealing with FRRs peer-groups.
The only restriction that now applies is that single peer cannot
disable a flag which was set by a peer-group, because 'no <command>' is
already being used for disabling a peer-specific override. This is not
supported by any known vendor though, would require many specific
edge-cases and magic comparisons and will most likely only end up
confusing the user. Additionally, peer-groups should only contain flags
which are being used by all peer members.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
On the case where an mp_unreach attribute is received, while there is no
mp_reach attribute too, it is not necessary to check for missing
attributes.
Fixes: 67495ddb2e ("bgpd: Fixes for recent well-known-attr check patch.")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Sometimes at startup, BGP Flowspec may be allocated a routing table
identifier not in the range of the predefined table range.
This issue is due to the fact that BGP peering goes up, while the BGP
did not yet retrieve the Table Range allocator.
The fix is done so that BGP PBR entries are not installed while
routing table identifier range is not obtained. Once the routing table
identifier is obtained, parse the FS entries and check that all selected
entries are installed, and if not, install it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Added following vty command:
[no] debug bgp pbr error
This permits dumping on the logs some errors related to PBR.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
On the case where an ecom from FS redirect is received, the ecom may be
with the format A.B.C.D:E. On this case, the printable format of the
Flowspec redirect VRF ecom value may use more bytes in the buffer
dedicated for that. The buffer that stores the ecommunity is increased.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
By default, some debug traces were displayed. Those pbr traces are
hidden with 'debug bgp zebra' command.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Currently, uninstall pbr rule is not handled by BGP notification
handler. So the uninstall update of the structure is done, immediately
after sending the request of uninstall to zebra.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There are cases where a redirect IP or redirect VRF stops the ecom
parsing, then ignores a subsequent rate value, letting passed value to
0. Consequently, a new table identifier may be elected, despite the
routing procedure is the same. This fix ignores the rate value in bpa
list.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The ecommunity was badly read. This fix ensures that all ecom are reads
and stored in local structure.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
policy routing is configurable via address-family ipv4 flowspec
subfamily node. This is then possible to restrict flowspec operation
through the BGP instance, to a single or some interfaces, but not all.
Two commands available:
[no] local-install [IFNAME]
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Once PBR rules installed, an information is printed in the main
show bgp ipv4 flowspec detail information.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Ability for BGP FS to convert some rules containining at least one
address and one port information into a pbr_match_entry rule.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Those 3 fields are read and written between zebra and bgpd.
This permits extending the ipset_entry structure.
Combinatories will be possible:
- filtering with one of the src/dst port.
- filtering with one of the range src/ range dst port
usage of src or dst is exclusive in a FS entry.
- filtering a port or a port range based on either src or dst port.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When rule add transaction is sent from bgpd to zebra, the reference
context must not be incremented while the confirmation message of
install has not been sent back; unless if the transaction failed to be
sent.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
On some cases, the ecommunity flowspec for redirect vrf is not displayed
in all cases. On top of that, display the values if ecom can no be
decoded.
Also, sub_type and type are changed from int to u_int8_t, because the
values contains match the type and sub type of extended communities.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The debugging message in charge of showing if the route is added or
witdrawn is changed accordingly to reflect this status.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Added improved error message text to other places that could also
encounter the same condition. In testing found that in certain
case, duplicate error messages were previously issued. This fix
also removes the duplicates.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Problem reported due to tab completion showing all possible peers
in every vrf, but when neighbor in wrong vrf entered "no such
neighbor" is the error message. Making it slightly more clear
with "no such neighbor in the view/vrf" to clue the user that they
may have specified the wrong vrf.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Move the list_delete_and_null of the virt->vrfs code to
the actual deletion function to ensure proper lifecycle.
This assumption allows us to know that irt->vrfs is always
true so remove the NULL check on it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The irt->vnis list was being freed on going down,
but actually delete it from the deletion function. Then
we can know that the irt->vnis is a valid list anywhere
we have a irt pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This option is only implemented by 4 daemons:
- BGPD
- RIPD
- RIPNGD
- Zebra
Manpages and documentation say that the option causes routes to not be
uninstalled from zebra when the daemon terminates. This is true for RIPD
and RIPNGD. This is not true for BGPD; in that daemon it only prevents
transmission of Cease / Peer Unconfig NOTIFICATION messages to peers.
Moreover, when any daemon disconnects from Zebra, all of its routes are
uninstalled from Zebra and the kernel regardless of this option,
rendering the option largely vestigial.
It is still useful in Zebra, where it prevents all routes from being
uninstalled when Zebra shuts down, so it is left there.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This commit moves the command 'bgp enforce-first-as' from global BGP
instance configuration to peer/neighbor configuration, which can now be
changed by executing '[no] neighbor <neighbor> enforce-first-as'.
End users can now enforce sane first-AS checking on regular sessions
while e.g. disabling the checks on routeserver sessions, which usually
strip away their own AS number from the path.
To ensure backwards-compatibility, a migration routine was added which
automatically sets the 'enforce-first-as' flag on all configured
neighbors if the old global setting was activated. The old global
command immediately disappears after running the migration routine once.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
There exists code paths where the rn was being used after free.
This eliminates these code paths.
Fixes: CM-21019
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit tries to adapt a similar codeflow within the `show bgp [afi]
[safi] neighbor <neighbor> advertised-routes` command compared to its
`received-routes` and `filtered-routes` opponents. Some branching code
has been restructured to achieve this.
Additionally, this commit fixes a memory leak within `received-routes`
(and `filtered-routes`, although the issue has been present before the
previous commit!) where the previous implementation forgot to
deduplicate the BGP attributes.
When a user called `<...> received-routes route-map <RM-TEST>` and that
routemap changed any AS path or community parameters, the duplicated
memory for these parameters was never freed. This has been fixed by
ensuring to call `bgp_attr_undup()` accordingly.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit changes the behavior of `show bgp [afi] [safi] neighbor
<neighbor> received-routes [json]` to return all received prefixes
instead of filtering rejected/denied prefixes.
Compared to Cisco and Juniper products, this is the usual way how this
command is supposed to work, as `show bgp [afi] [safi] neighbor
<neighbor> routes` will already return all accepted prefixes.
Additionally, the new command `show bgp [afi] [safi] neighbor <neighbor>
filtered-routes` has been added, which returns a list of all prefixes
that got filtered away, so it can be roughly described as a subset of
"received prefixes - accepted prefixes".
As the already available `filtered_count` variable inside
`show_adj_route` has not been used before, the last output line
summarizing the amount of prefixes found was extended to also mention
the amount of filtered prefixes if present.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The creation of a temporary string for the ecommunity
was being leaked when debugging is enabled. Write
a bit of code to prevent this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_info_extra_get call gets the extra pointer, which
is also needed for the setlabels() call, so move the call
to above the setlabels.
Also remove an unnecessary test of a pointer since we
have already dereferenced it by the time we are testing
for it's existence.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
While the current implementation does pay attention to the AF
(inet/inet6) when comparing the IPv4/v6 address against an address-list
/ prefix-list inside a route-map, the AF check is being done rather
late, which leads to CPU cycles being wasted due to unnecessary list
lookups / address matching.
This commit checks the address family of a prefix right inside the
`route_match_ip(v6)_` functions before looking up any address- and/or
prefix-list, which should improve performance.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current implementation does not respect the AFI+SAFI combination of
a peer when executing a non-soft (hard) clear. An example would be the
command `clear bgp ipv4 unicast *`, which will clear all BGP peers, even
those that do not have IPv4-Unicast activated.
This commit fixes that behavior by applying the same rules to both soft
and hard clear commands, so that peers without a matching AFI+SAFI
combination will be no longer modified.
Additionally, this commit adds warning messages to all `clear bgp
[<afi>] [<safi>] <target>` commands when no matching peers with the given
AFI+SAFI combination could be found.
Both existing and new warning messages have been extended to also
mention the AFI+SAFI combination that is missing, which is more helpful
to the user than a generic expression 'No peer configured'.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current implementation of building JSON output is greatly different
for large communities compared to standard communities. This is mainly
noticeable by the missing 'list' attribute, which usually offers an
array of all communities present on a BGP route.
This commit adds the missing functionality of properly returning a
'list' attribute in JSON output and also tries a similar approach like
the standard communities are using to implement this feature.
Additionally, the 'format' specifier has been completely removed from
large communities string/JSON rendering, as the official RFC8092 specifies that
there is only one canonical representation:
> The canonical representation of BGP Large Communities is three
> separate unsigned integers in decimal notation in the following
> order: Global Administrator, Local Data 1, Local Data 2. Numbers
> MUST NOT contain leading zeros; a zero value MUST be represented with
> a single zero. Each number is separated from the next by a single
> colon. For example: 64496:4294967295:2, 64496:0:0.
As the 'format' specifier has not been used/checked and only one
canonical representation exists per today, there was no reason to keep
the 'format' parameter in the function signature.
Last but not least, the struct attribute 'community_entry.config' is no
longer being used for large communities and instead 'lcommunity_str' is
being called to maintain a similar approach to standard communities.
As a side effect, this also fixed a memory leak inside 'community_entry_free'
which did not free the allocated memory for the 'config' attribute when
dealing with a large community.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
Ensure that when EVPN routes are imported into a VRF as IPv4 routes,
the NEXT_HOP attribute is set. In the absence of this, this attribute
is currently not generated when advertising the route to peers in the
VRF. It is to be noted that the source route (the EVPN route) will only
have the MP_REACH_NLRI attribute that contains the next hop in it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Imported routes in a VRF routing table have a reference to their parent
route entry which resides in the EVPN or IPVPN routing table. Ensure that
this reference uses appropriate locking so that the parent entry doesn't
get freed prematurely.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
(cherry picked from commit 13cb6b22ba9d558b1b4a1e8752f63f13242462a7)
Conflicts:
bgpd/bgp_mplsvpn.c
Ticket: CM-20471
Testing Done:
a) Ran vrf_route_leak tests without fix and hit crash, ran twice with fix
and did not see the crash.
b) Ran evpn-smoke and ensured there were no new failures.
The previous implementation of bgp_peer_lookup_next did not consider the
internal ordering of peers when using peer groups, which led to all
standalone peers being skipped that had a lower IP address than the
highest IP address of a peer belonging to a group.
As the ordering of peers can not be arbitrary due to SNMP requiring
increasing OIDs when walking an OID tree, this commit fixes the bug by
properly looping through all peers and detecting the next highest IP
address.
Additionally, this commit improved both bgp_peer_lookup_next and
peer_lookup_addr_ipv4 by using the socketunion stored within the peer
struct (peer->su) instead of calling inet_pton for each peer during
comparison.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
When bgp is thinking about opening a connection to a peer,
if we are connected to zebra, allow that to influence our
decision to start the connection.
Found Scenario:
Both bgp and zebra are started up at the same time. Zebra is
being used to create the connected route through which bgp
will establish a peering relationship. The machine is a
bit loaded due to other startup conditions and as such bgp
gets to the connection stage here before zebra has installed
the route. If bgp does not respect zebra data when it does
have a connection then we will attempt to connect. The
connect will fail because there is no route. At that time
we will go into the connect timeout(2 minutes) and delay
connection.
What this does. If we have established a zebra connection and
we do not have a clear path to the destination at this point
do not allow the connection to proceed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The handling of the return codes for getsockopt was slightly wrong.
getsockopt returns -1 on error and errno is set.
What to do with the return code at that point is dependent
on what sockopt you are asking about. In this case
status holds the error returned for SO_ERROR.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Before this fix, both real neighbors and peer-groups were lumped
together in auto-completion and it didn't work at all for
peer-groups. This fix changes that behavior to do the right
thing.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Ensure that the next hop of the leaked VRF is not overwritten when the
route is being imported into the target VRF from the VPN table. Also, in
the case of multipath routes, ensure that the nexthop's ifindex is not
inadvertently reset.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
There are situations in which zebra may issue more than one delete
notification, so BGP should not warn when it can't locate the VNI
at delete. This is comparable to the situation when a withdraw is
received but the route isn't present locally.
Signed-off-by: Vivek Venkatraman <vivek@cumulusmetworks.com>
Ticket: CM-17512
Reviewed By: Trivial
Testing Done: Manual
This flag needs to be set by default for l2vpn evpn address-family.
We needed to find a place in the code which gets called by all peers
at somepoint in the statemachine and before the routes are advertised.
peer_new seems like the right place for this
as we are setting other default af_flags here as well.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
In the FRR implementation of EVPN,
eBGP leaf-spine peering for EVPN is fully supported by allowing
the next hop to be propagated and not rewritten at each hop.
There are other changes also related to route import to facilitate this.
However, propagating the next hop is not correct in some cases.
Specifically, if the DC is comprised of multiple PODs
with distinct intra-POD and inter-POD VxLAN tunnels,
EVPN routes received from an adjacent POD by a border/exit leaf
must be propagated into the local POD with the next hop rewritten (to self).
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
We install type-5 routes as ipv4/ipv6 unicast routes in the vrf table.
along with these routes, we also install the RMAC
and the nexthop Neigh entries.
There might be scenarios were the bestpath has changed and
we are now pointing to a new nexthop with a different RMAC.
As per BGP logic, we just send an update for the route and the nexthop
is replaced. However, this causes problem because the RMAC and neigh entry
corresponding to the previous nexthop are still lingering in the system.
We need to clear those entries for proper functoning.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
A newly added ipv4/ipv6 route in BGP RIB might be advertised as type-5 EVPN route.
The user might have configured a route-map for advertising type-5 routes.
We need to apply this route-map while advertising ipv4/ipv6 routes as type-5.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
We enable/disable type-5 routes by following commands:
advertise ipv4 unicast [route-map <route-map>]
advertise ipv6 commands [route-map <route-map>]
the route-map part was writtem to conf file.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
no advertise ipv6 unicast command should unset the corresponding
af_flag in bgp_vrf rather than the vrf_flags.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Display the table version for EVPN routes like it is done for other
address families. Note that this is really relevant only for the
per-VNI routing table.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-12903
Ensure that when EVPN routes are installed into zebra, the router MAC
is passed per next hop and appropriately handled. This is required for
proper multipath operation.
Ticket: CM-18999
Reviewed By:
Testing Done: Verified failed scenario, other manual tests
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Attribute set on peer was being overridden when set on the peer-group.
This commit also adds a parallel flags array that indicates whether a
particular flag is sourced from the peer-group or is peer-specific. It
assumes the default state of all flags is unset. This looks to be true
except in the case of PEER_FLAG_SEND_COMMUNITY,
PEER_FLAG_SEND_EXT_COMMUNITY, and PEER_FLAG_SEND_LARGE_COMMUNITY; these
flags are set by default except when the user specifies to use
config-type = cisco. However the flag field can merely be flipped to
mean the negation of those options in a future commit.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
EVPN prefix depends on the EVPN route type.
Currently, in FRR we have a prefix_evpn/evpn_addr which relates to a evpn prefix.
We need to convert this to encompass an union of various EVPN route-types.
This diff handles the necessary code changes to adopt the new struct evpn_addr.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Handle multiple PREFIX_SID's at the same time. The draft clearly
states that multiple should be handled and we have a actual pcap
file that clearly has multiple PREFIX_SID's at the same time.
Fixes: #2153
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
All messages to/from the label manager include two additional
fields: protocol and instance. This patch fixes the parsing
of label chunks response used by BGPd, which did not consider
the two fields.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
Upon BGP destroy, the hash list related to PBR are removed.
The pbr_match entries, as well as the contained pbr_match_entries
entries.
Then the pbr_action entries. The order is important, since the former
are referencing pbr_action. So the references must be removed, prior to
remove pbr action.
Also, the zebra associated contexts are removed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Upon redirect VRF message from FS, add a default route to the VRF
interface associated to the VRF.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
A table chunk of 100000 is allocated from zebra, and when needed in
flowspec, the table identifier is extracted from that chunk.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If a new rule is identified, a new table identifier is created.
In that table, add a default route when possible. If redirect IP rule is
identified, then add a default route to that IP address.
If redirect VRF is identified, nothing is done for now
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
once an iprule has been created, a notification is sent back, and the
context of bgp_action is searched.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit is reading the installed2 value from bgp_pbr_match hash set.
Once value matches with the one received, the walk stops and the last
bgp_pbr_match structure is stored in a static entry, so that the entry
is obtained.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Once the bgp flowspec entry is validated, then that means that zebra is
able to handle the entries.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The API for filling in an IPTABLE_ADD and IPTABLE_DELETE message.
Also, the API is handling the notification callback, so as to know if
zebra managed to add or delete the relevant iptable entry.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add a policy-route API to handle flowspec entry.
The entry is analysed, converted, and
selected if it is possible to inject the flowspec entry in local policy
routing entries.
redirect IP and redirect VRF actions are handled. The former extracts
the IPv4 address to redirect traffic to. The latter calculates the
matching VRF to redirect traffic to.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This utility routine in bgp ecommunity converts the flowspec actions
into a readable format in a policy routing action context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This utility function analyses flowspec nlri and converts it into
readable structures. The structure is based on bgp_pbr_match structure
previously defined.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This structure is the model exchange between some bgp services like
flowspec and the policy routing service. This structure reflects what
the nlri entry means. To handle that structure, a dump routine is made
available. Also, a validation function is here to cancel a policy route
installation, whenever it is not possible to install the requested
policy routing.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This command is used to troubleshoot the routes that are installed inbgp
pbr fib, before being injected in zebra.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
bgp structure is being extended with hash sets that will be used by
flowspec to give policy routing facilities.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The APIs that handle ipset and iprule contexts from zebra are being
handled in this commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
BGP flowspec will be able to inject or remove policy-routing contexts,
thanks to some protocols like flowspec. This commit adds some the APIS
necessary to create/delete policy routing contexts that will be injected
then into zebra.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
As part of recent vpn-vrf leaking changes, it is now possible for a
route to refer to a nexthop in a different vrf. There is also a new
route flag that means "when announcing this route, indicate myself
as the next-hop."
route_vty_out(): nexthops are appended with:
"@VRFID" (where VRFID is the numerical vrf id) when different from
the route's vrf;
"<" when the route's BGP_INFO_ANNC_NH_SELF is set
This change also shows the route table's vrf id in the table header.
route_vty_out_detail(): show nexthop's vrf and announce-nh-self flag if
appropriate.
Both functions are also augmented to add json elements nhVrfId, nhVrfName,
and announceNexthopSelf as appropriate.
The intent of these changes is to make it easier to understand/debug
the relationship between a route and its nexthops.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Ensure that when EVPN routes are installed into zebra, the router MAC
is passed per next hop and appropriately handled. This is required for
proper multipath operation.
Ticket: CM-18999
Reviewed By:
Testing Done: Verified failed scenario, other manual tests
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The vrf 2 vrf route leaking auto-derives RD and RT and
installs the routes into the appropriate vpn table.
These routes when a operator configured ipv[4|6] vpn
neighbors were showing up off box. The RD and RT
values choosen are localy significant but globaly
useless and may cause confusion.
Put a special bit of code in to notice that we
should not be advertising these routes off box.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit reverts part of ceb800e0edb9f8979cebb1e6be9497d787bee39c
as it was found to be causing issues in upstream CI.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The loop over all afi's implies that these commands actually need
to loop over all afi's to check the vpn policy. We know the
appropriate afi based upon the node we are in. So just return
the correct afi to look at and then just apply it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Prior to this fix, you could configure importing a vrf from inside
the same vrf. This can lead to unexpected behavior in the leaking
process. This fix disallows that behavior.
Ticket: CM-20539
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Tripped over a crash running the cli_crawler that occurred when the
sequence was doing "import vrf NAME" and "no import vrf NAME" inside
a vrf but a default bgp instance had not been created. This fix
auto-creates the default instance if the "import vrf NAME" is
entered and a default instance does not exist.
Ticket: CM-20532
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Prior to this fix, the import vrf route-map command only worked
if the route-map existed prior to the command. Additionally, if
the import vrf route-map command was issued without an existing
route-map, the imported prefixes were not removed. This fix
resolves both of thes mis-behaviors. bgp-smoke run with same
failures as base.
Ticket: CM-20459
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-7358
Found that when doing "import vrf default" in another vrf, an
extra line was added to the configuration in error. This fix
resolves that incorrect configuration. Manual testing will be
attached to the defect.
Ticket: CM-20467
Signed-off-by: Don Slice <dslice@cumulustnetworks.com>
Reviewed by: Donald Sharp <sharpd@cumulusnetworks.com>
The usage of MTYPE_ECOMMUNITY for the free in ecommunity_del_val
caused the ref counts for the ecommunity to be incorrect.
Use MTYPE_ECOMMUNITY_VAL since that is what we are deleting.
Ticket: CM-20602
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There were a couple of instances of code extending
beyond 80 columns, clean it up with clang-format.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Note that when we are importing vrf EVA into vrf DONNA
we must keep track of all the vrfs EVA is being
exported into and we must also keep track of all the vrf's
that DONNA is receiving data from as well.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The data type for a variable in bgp_ecommunity.c was
a non-standard type and was causing build failures
on some more obscure build targets.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Added the cli for doing route-map filtering on imported routes via
the new "import vrf route-map <NAME> command.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Connected routes redistributed into BGP as well as IPv4 routes with IPv6
link-local next hops (RFC 5549) need information about the associated
interface in BGP if they are candidates to be leaked into another VRF. In
the absence of route leaking, this was not necessary. Introduce the
appropriate mechanism and ensure this is used during route install (in
the target VRF).
Ticket: CM-20343, CM-20382
Testing done:
1. Manually verified failed scenarios and some additional ones - logs
in the tickets.
2. Ran bgp-min and evpn-min - results are good.
3. Ran vrf smoke - has some failures, but none which look new
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ensure that when a route redistributed into a VRF is subsequently
deleted, it is properly removed from the VPN table (if exported)
so that it can be removed from other VRFs and withdrawn from
L3VPN peers.
Ticket: CM-20345
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The VRF routes exported to the global VPN table must not be
imported routes. It is not necessary to check if they originate
in the global VPN instance as that doesn't hold good for VRF-to-
VRF route-leaking. Merely checking that they are not imported
should handle both L3VPN and VRF-to-VRF route-leaking use cases.
Ticket: CM-20283
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When routes are imported into a VRF from the global VPN table, the
parent instance is either the default instance in the case of L3VPN or
the source VRF in the case of VRF-to-VRF route leaking. Hence, obtain
the source peer by just looking at the parent route information.
Ticket: CM-20283
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Implement fixes for route leaking between VRFs through BGP, especially for
the scenario where routes are leaked from a VRF X to multiple other VRFs.
This include making sure that import and export happen via the global VPN
table, setting RD correctly and proper handling for multiple import/export.
Ticket: CM-20256
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Setup a per-VRF identifier to use along with the Router Id to build the
RD. Define a function to encode the RD. Code is brought over from EVPN
and EVPN code has been modified to use the generic function.
Ticket: CM-20256
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When routes are injected into the VPN table and then leaked into candidate
VRFs, the source should be the default instance. Also, the loop check when
withdrawing routes from a VRF should be that the route's origin isn't that
VRF; this handles VRF route leaking also and is consistent with checks in
other places.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
(cherry picked from commit 0149d2848c59bfb3277592caf0a5d5e07a2de872)
Ticket: CM-20256
When the `import vrf XXX` command is entered under
an afi/safi for bgp and the XXX vrf bgp instance
does not yet exist, auto-create it using the same
ASN that the we are importing into.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
add the `import vrf XXXX` command
router bgp 4 vrf DONNA
<config>
!
router bgp 4 vrf EVA
<config>
address-family ipv4 uni
import vrf DONNA
!
!
This command will allow for vrf EVA to specify that it would like
to receive the routes from vrf DONNA into it's table.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
RFC 8635 explains how RT auto-derivation should be done in section
5.1.2.1 [1]. In addition to encoding the VNI in the lowest bytes, a
3-bit field is used to encode a namespace. For VXLAN, we have to put 1
in this field. This is needed for proper interoperability with RT
auto-derivation in JunOS. Since this would break existing setup, an
additional option, "autort rfc8365-compatible" is used.
[1]: https://tools.ietf.org/html/rfc8365#section-5.1.2.1
Signed-off-by: Vincent Bernat <vincent@bernat.im>
bgp route extra->bgp_orig for routes leaked vpn->vrf should be set
to original extra->bgp_orig if it is set, not vpn's bgp instance.
The initial leak is OK because it goes through a loopback path
in the vrf->vpn leaking code, but it is possible later re-leaks (e.g.,
if the destination vrf's leak configuration is changed) could
set the wrong extra->bgp_orig and break the route's nexthop.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
When sending a bgp route down to zebra for deletion, the
ZEBRA_FLAG_ALLOW_RECURSION and ZEBRA_FLAG_IBGP flags
are not needed in zebra. So remove the setting
of the api.flags. If we remove this data from being
passed down we no longer need the peer data structure.
Remove the lookup of the peer data structure and the setting
of the flags as that peer was NULL in some evpn symmetric
routing cases for shutdown of bgp.
Ticket: CM-20720
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
given a configuration such as this:
router bgp 7777 vrf A
address-family ipv4 unicast
route-map vpn import FOO
import vpn
or this:
router bgp 7777 vrf A
address-family ipv4 unicast
rd vpn export 1:3
rt vpn export 1:100
route-map vpn export FOO
export vpn
Previous code allowed leaking if the named FOO route-map was not defined.
Since the configuration is logically incomplete, if a route-map is named
for "vpn export" or "vpn import" but is not defined, leaking should not
occur until the route-map is defined.
This changeset implements the correct behavior.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
* Remove unused parameter
* Restore behavior described by function comment
* Eliminate NPD caught by static analysis
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
vpn-vrf leak code was not triggering a BGP update or an announce to zebra
if a route's labels changed. This changeset corrects that problem.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
The return value of XCALLOC will always be non-null. Even if it were to
be null, this code would still crash with a NPD.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Here we have a block conditional on the nullity of a pointer, followed
by a dereferennce of the same pointer. Move the deref into the
conditional block.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
clang-analyze complains that data may be null, and since we didn't
explicitly check it (although we did check the overall packet length
minus the header length) it has a point.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We have changed the flow in which we advertise the VNI subnet.
We will mark this command as hidden for all future purposes.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Add support for CLI "auto" keyword in vrf->vpn export label:
router bgp NNN vrf FOO
address-family ipv4 unicast
label vpn export auto
exit-address-family
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
MPLS label pool backed by allocations from the zebra label manager.
A caller requests a label (e.g., in support of an "auto" label
specification in the CLI) via lp_get(), supplying a unique ID and
a callback function. The callback function is invoked at a later
time with the unique ID and a label value to inform the requestor
of the assigned label.
Requestors may release their labels back to the pool via lp_release().
The label pool is stocked with labels allocated by the zebra label
manager. The interaction with zebra is asynchronous so that bgpd
is not blocked while awaiting a label allocation from zebra.
The label pool implementation allows for bgpd operation before (or
without) zebra, and gracefully handles loss and reconnection of
zebra. Of course, before initial connection with zebra, no labels
are assigned to requestors. If the zebra connection is lost and
regained, callbacks to requestors will invalidate old assignments
and then assign new labels.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
In general, routes leaked from the vpn rib to a vrf include any
labels that might have been attached to the vpn route. VRF routes
that have labels attached require a label-switched path and therefore
require nexthops with labels in order to be marked valid by the
nexthop-tracking logic.
However, some routes in the vpn RIB originated in vrfs local to this
router. Even though they may have labels, we must omit the labels
when leaking to a vrf because traffic using those resulting routes
will be carried by this router via IP routing and not label switching.
The nexthops of these routes do not need to indicate a label-switched
path, and thus the routes should be marked valid even when their nexthops
do not have labels.
This changeset omits labels from vpn->vrf leaked routes when the ultimate
source of the vpn route was a local vrf.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Ethernet Tag ID (ETI) is part of the prefix. It cannot just be ignored
as it needs to be used when checking for prefix uniqueness. Moreover,
when using Quagga as a route reflector, we need to keep its
value. Therefore, we correctly parse and encode it. We also parse
ESI. While not part of the prefix, it needs to be reflected correctly
by Quagga.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Most presumably, the nexthop IP is present, only when ECOM redirect IP
is present. The nexthop is displayed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This extended ecommunity is defined with
draft-ietf-idr-flowspec-redirect-ip-02 and is read from the BGP update
received.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Routes that have labels must be sent via a nexthop that also has labels.
This change notes whether any path in a nexthop update from zebra contains
labels. If so, then the nexthop is valid for routes that have labels.
If a nexthop update has no labeled paths, then any labeled routes
referencing the nexthop are marked not valid.
Add a route flag BGP_INFO_ANNC_NH_SELF that means "advertise myself
as nexthop when announcing" so that we can track our notion of the
nexthop without revealing it to peers.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
snprintf routine is used widely, when the handler routine in charge of
displaying the output is called.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
FS UNREACH message with 0 NLRI inside is sent after each peer
establishment. FS can send NLRI messages with no nexthop.
The commit fixes a message that is triggered by mistake
if FS was about to be sent, then that message is not output.
Also it fixes a typo.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This command gives detail about a FS entry which contains an IP that
matches one of the rules of the FS entry. The output is the same output
as when one does show bgp ipv4 flowspec detail
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The json format is returd when requested from the two commands:
- show bgp ipv4 flowspec
- show bgp ipv4 flowspec detail
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
It is possible to enhance debug bgp flowspec feature by using vty
command. This command, if enabled, will dump the match/set couple of
information received on NLRI.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The show bgp ipv4 flowspec routine is made available, displays the
flowspec rules contained in the BGP FIB database, as well as the actions
to be done on those rules. Two routines are available:
show bgp ipv4 flowspec
show bgp ipv4 flowspec detail
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The changes introduce validation of NLRI FS entries at incoming, before
being pushed in FIB. Note that the so called validation only checks for
validity of the incoming flowspec nlri format, and not the validation as
per RFC5575.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The FS nlri is depicted so as to be able to be in readable format,
either by human, or by some other ( remote daemon ?).
This work is a derived work from [0]. Initially done for validation
only, this work is extended.
The FS NLRI is able to decode addresses, numbers ( protocols, ports,
tcp values) combined ( or not) with operators. This makes possible
to have a NLRI entry for a set of ports, and/or for an other set of
value of port.
This implementation mainly brings the API visible. The API should be
consistent across the various usages.
[0] https://github.com/chinatelecom-sdn-group/quagga_flowspec/
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: jaydom <chinatelecom-sdn-group@github.com>
Flowspec entries do not need aggregation feature.
Actually, all flowspec entries are unique.
So, some check is done against aggregate functionalities in the code.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This work is derived from a work done by China-Telecom.
That initial work can be found in [0].
As the gap between frr and quagga is important, a reworks has been
done in the meantime.
The initial work consists of bringing the following:
- Bringing the client side of flowspec.
- the enhancement of address-family ipv4/ipv6 flowspec
- partial data path handling at reception has been prepared
- the support for ipv4 flowspec or ipv6 flowspec in BGP open messages,
and the internals of BGP has been done.
- the memory contexts necessary for flowspec has been provisioned
In addition to this work, the following has been done:
- the complement of adaptation for FS safi in bgp code
- the code checkstyle has been reworked so as to match frr checkstyle
- the processing of IPv6 FS NLRI is prevented
- the processing of FS NLRI is stopped ( temporary)
[0] https://github.com/chinatelecom-sdn-group/quagga_flowspec/
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: jaydom <chinatelecom-sdn-group@github.com>
Ensure that only EVPN routes are flagged as such when installing into or
withdrawing from zebra, the earlier check broke L3VPN or VRF route-leaked
routes. Also, fix an incorrect check related to imported routes in path
selection.
Updates: bgpd: Use BGP_ROUTE_IMPORTED for EVPN [vivek]
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When an L3 VNI is deleted, cleanup linkage to it from associated
L2 VNIs.
Updates: bgpd: keep a backpointer to vrf instance in struct bgpevpn
[Mitesh Kanjariya]
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
BGP is calculating a v6 routes nexthop as the nexthop address
+ an ifindex. The ifindex calculated comes from where we received
the route from as that we have to do this for LL addresses.
But a v6 address that is not a LL we do not need to provide
to zebra for nexthop resolution because a global address
by default can be looked up and resolved appropriately.
Modify the code so that we must have an ifindex for a v6 nexthop
if the address is LL, else don't pass the ifindex down to zebra.
Fixes: #1986
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In BGP, doing policy-routing requires to use table identifiers.
Flowspec protocol will need to have that. 1 API from bgp zebra has been
done to get the table chunk.
Internally, onec flowspec is enabled, the BGP engine will try to
connect smoothly to the table manager. If zebra is not connected, it
will try to connect 10 seconds later. If zebra is connected, and it is
success, then a polling mechanism each 60 seconds is put in place. All
the internal mechanism has no impact on the BGP process.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This commit is relying on bgp vpn-policy. It is needed to configure
several bgp vrf instances, and in each of the bgp instance, configure
the following command under address-family ipv4 unicast node:
[no] rt redirect import RTLIST
Then, a function is provided, that will parse the BGP instances.
The incoming ecommunity will be compared with the configured rt redirect
import ecommunity list, and return the VRF first instance of the matching
route target.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Previous patches to suppress display of automatically calculated
coalesce-time did not fully work because the flag indicating whether the
value was automatically calculated was not set properly upon creation.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When uninstalling routes from zebra, ensure that the BGP instance for
which processing is being done is used to derive the VRF. It is incorrect
to derive the VRF from the peer when dealing with scenarios like VRF route
leaking, EVPN symmetric/external routing etc., where the peer which sourced
the route could belong to a different VRF.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Ticket: CM-18413
Reviewed By: CCR-6778
Testing Done: Manual testing of BGP route withdraw/delete, bgp-min
When a peer is removed the routes are withdrawn via bgp_process_main_one
As such we need to put a bit of code in to handle this situation
for the vpn/vrf route leaking code.
I think this code path is also called for when a vrf's route is
changed and I believe we will end up putting a bit more code
here to handle the nexthop changes.
I've also started trying to document the bgp_process_main_one
function a bit better.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The str2prefix_rd function can fail, but for auto-derived
values this should be impossible to happen. So ignore it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit does these 2 things:
1) irt->vrfs is never NULL so no need to test for it
2) No need to check for a good irt value returned from
vrf_import_rt_new as that the alloc operation will
dump if memory allocation fails.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We lock and set peer->bgp at peer creation and only
remove it at deletion. Therefore these tests are
not needed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
- vpn_leak_to_vpn_active(): check instance type
- vpn_leak_prechange(): qualify with test for active
- vpn_leak_postchange(): remove duplicated call to
vpn_leak_from_vrf_update_all()
- bgp_vty.c: Avoid null-pointer dereference for command "no rt vpn import"
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
PR #1739 added code to leak routes between (default VRF) VPN safi and unicast RIBs in any VRF. That set of changes included temporary CLI including vpn-policy blocks to specify RD/RT/label/&c. After considerable discussion, we arrived at a consensus CLI shown below.
The code of this PR implements the vpn-specific parts of this syntax:
router bgp <as> [vrf <FOO>]
address-family <afi> unicast
rd (vpn|evpn) export (AS:NN | IP:nn)
label (vpn|evpn) export (0..1048575)
rt (vpn|evpn) (import|export|both) RTLIST...
nexthop vpn (import|export) (A.B.C.D | X:X::X:X)
route-map (vpn|evpn|vrf NAME) (import|export) MAP
[no] import|export [vpn|evpn|evpn8]
[no] import|export vrf NAME
User documentation of the vpn-specific parts of the above syntax is in PR #1937
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
We have a signed/unsigned comparison warning that this should
fix.
This should be ok because the range of input is a very limited
value and should never be of concern
Fixes: #1919
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Set EVPN routes imported into a VRF to (sub)type BGP_ROUTE_IMPORTED and
use this for passing appropriate information to zebra. This is needed
because relying on the Router MAC for this purpose was incorrect and
impacted routing to/from external destinations, particularly for IPv6.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Place the nexthop vrf id setting outside the afi if split
so that each side does not need to keep track of this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In bgp_update_receive the first thing we do is establish
that the peer->status is Established. We then do a bunch
of work and call bgp_nlri_parse where we break out for
each address family. Each AFI is then checking for
being peer->status is Established again. There is no
point in checking this again.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
- add "debug bgp vpn label" CLI
- improved debug messages for "debug bgp bestpath"
- send vrf label to zebra after zebra informs bgpd of vrf_id
- withdraw vrf_label from zebra if zebra informs bgpd that vrf_id is disabled
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
The work_queue_free function free'd up the wq pointer but
did not set it too NULL. This of course causes situations
where we may use the work_queue after it is freed. Let's
modify the work_queue to set the pointer for you.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit fixes the handling of incoming parameters passed in
following vty functions:
clear ip bgp ipv6 [safi] prefix []
clear ip bgp [vrf ] ipv6 [safi] prefix []
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When VRF is not yet available at startup, the check for main socket
presence must be done. As the main socket creation is made in a separate
place from vrf socket for netns, ths main socket creation must not be
prevented when a BGP VRF relies on vrf lite mechanism.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Received PMSI tunnel attributes (in EVPN type-3 route) were not recognized.
Parse them and display the tunnel type when looking at routes. Note that
the only tunnel type currently supported is ingress replication (IR). A
warning message will be logged if the received tunnel type is something
else, but the attribute is otherwise ignored.
Updates: a21bd7a (bgpd: add PMSI_TUNNEL_ATTRIBUTE to EVPN IMET routes)
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
This capability, when used, is mapped over linux sys_admin capability.
This is necessary from the daemon perspective, in order to handle NETNS
based VRFs, because calling setns() requires sys admin capability.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Implement support for EVPN symmetric routing for IPv6 routes. The next hop
for EVPN routes is the IP address of the remote VTEP which is only an IPv4
address. This means that for IPv6 symmetric routing, there will be IPv6
destinations with IPv4 next hops. To make this work, the IPv4 next hops are
converted into IPv4-mapped IPv6 addresses.
As part of support, ensure that "L3" route-targets are not announced with
IPv6 link-local addresses so that they won't be installed in the routing
table.
Signed-off-by: Vivek Venkatraman vivek@cumulusnetworks.com
Reviewed-by: Mitesh Kanjariya mitesh@cumulusnetworks.com
Reviewed-by: Donald Sharp sharpd@cumulusnetworks.com
We have af_flags in struct bgp which holds address family related flags.
Seems like we had a conflict between two flags.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Instead of relying on local usage of vrf bind operation, the vrf API for
that usage is done.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
fixup bgp
Because socket creation is tightly linked with socket binding for vrf
lite, the proposal is made to extend socket creation APIs and to create
a new API called vrf_bind that applies to vrf lite. The passed interface
name is the interface that will be bound to the socket passed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
NETNS is initialised from the VRF, instead of being directly called,
because this is not up to BGP daemon to initialise the various VRF
backend.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Upon starting a BGP VRF instance, the server socket is not created,
because the VRF ID is not known, and then underlying VRF backend is not
ready yet. Because of that, the peer connection attempt will not be
started before.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Upon creation of BGP instances, server socket may or may not be created.
In the case of VRF instances, if the VRF backend relies on NETNS, then
a new server socket will be created for each BGP VRF instance. If the
VRF backend relies on VRF LITE, then only one server socket will be
enough. Moreover, At startup, with BGP VRF configuration, a server
socket may not be created if VRF is not the default one or VRF is not
recognized yet.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The change contained in this commit does the following:
- discovery of vrf id from zebra daemon, and adaptation of bgp contexts
with BGP.
The list of network addresses contain a reference to the bgp context
supporting the vrf.
The bgp context contains a vrf pointer that gives information about
the netns path in case the vrf is a netns path.
Only some contexts are impacted, namely socket creation, and retrieval
of local IP settings. ( this requires vrf identifier).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The PMSI attribute is only applicable to EVPN type-3 route.
Rmac is applicable to type-2 and type-5 routes.
We should attach these attributes appropiately based on route-type.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
We will keep a backpointer to bgp vrf instance in bgpevpn.
struct bgpevpn denotes a l2vni and bgp_vrf corresponds to l3vni.
A back pointer to the vrf will provide efficient
access to vrf when needed.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When a non-default vrf rd is configured under l2vpn evpn in a vrf,
we need to update the config file.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
The ZEBRA_FLAG_INTERNAL flag is used to signal to zebra that
the route being added, the nexthops for it can be recursively
resolved. This name keeps throwing me off when I read it
so let's rename to something that allows the developer to
understand what is going on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
peer->ifindex was only used in two places but it was never populated so
neither of them worked as they should. 'struct peer' also has a 'struct
interface' pointer which we can use to get the ifindex.
A Border Leaf can originate a default route
for all the leafs within the POD.
However, we do not want to advertise this route outside the POD.
Therefore, we provide an option
to filter a EVPN type5 default route through a route-map.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Implement support for 'default-originate' for L2VPN/EVPN address family.
This is needed for the case where external routing within a POD,
will follow the default route to the border/exit leaf.
The border leaf has more than one next hop to forward the packet on to,
depending on the destination IP.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
We have af_flags in struct bgp to hold address family related flags,
l2vpn evpn flags to indicate advertise ipvX unicast should be moved there.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Type-5 routes can be useful in multiple scenarios such as advertise-subnet,
default-originate etc. Currently, the code has a restriction that to allow
advertising type-5 routes, user has to first enable advertise ipvX command.
This restriction is not necessary and should be removed.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Incorrect check for sentinel value effectively caused peers to sometimes
use the keepalive timer value of other peers, which sometimes led to
hold timer expiry.
Signed-off-by: Quentin Young <qlyoung@qlyoung.net>
The value 'pnt' was being set but never used. If we need
this in the future it will be a simple thing to add back
in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Under a BGP VRF instance, prevent from entering in vrf-policy mode. This
mode is reserved for non VRF instances that want to handle several VRF
at the same time.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
This worked for unnumbered peers but not for numbered peers. This is
before the fix:
router bgp 100
coalesce-time 1000
neighbor FOO peer-group
neighbor FOO remote-as external
neighbor swp1 interface peer-group FOO
neighbor 1.1.1.1 peer-group FOO
!
line vty
exec-timeout 0 0
!
end
cel-redxp-10# wr
Note: this version of vtysh never writes vtysh.conf
Building Configuration...
Integrated configuration saved to /etc/frr/frr.conf
[OK]
cel-redxp-10# conf t
cel-redxp-10(config)# router bgp
cel-redxp-10(config-router)# no neighbor swp1 interface peer-group FOO
cel-redxp-10(config-router)# no neighbor 1.1.1.1 peer-group FOO
cel-redxp-10(config-router)# do show run
Building configuration...
Current configuration:
!
frr version 4.1-dev
frr defaults datacenter
hostname cel-redxp-10
!
service integrated-vtysh-config
!
password cn321
!
log syslog
!
router bgp 100
coalesce-time 1000
neighbor FOO peer-group
neighbor FOO remote-as external
neighbor 1.1.1.1 remote-as external
!
address-family ipv4 unicast
no neighbor 1.1.1.1 activate
exit-address-family
!
line vty
exec-timeout 0 0
!
end
cel-redxp-10(config-router)#
After the fix "no neighbor 1.1.1.1 peer-group FOO" removes the 1.1.1.1
neighbor.
FRR/CL provides the means for injecting regular (IPv4) routes
from the BGP RIB into EVPN as type-5 routes.
This needs to be enhanced to allow selective injection.
This can be achieved by adding a route-map option
for the "advertise ipv4/ipv6 unicast" command.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Asymmetric routing is an ideal choice when all VLANs are cfged on all leafs.
It simplifies the routing configuration and
eliminates potential need for advertising subnet routes.
However, we need to reach the Internet or global destinations
or to do subnet-based routing between PODs or DCs.
This requires EVPN type-5 routes but those routes require L3 VNI configuration.
This task is to support EVPN type-5 routes for prefix-based routing in
conjunction with asymmetric routing within the POD/DC.
It is done by providing an option to use the L3 VNI only for prefix routes,
so that type-2 routes (host routes) will only use the L2 VNI.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When an l3-vni is enabled, type-2 routes are sent with 2 labels (l2vni and l3vni).
When it gets deleted, we need to update type-2 routes and send them with only one label (l2vni).
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Ensure that if multiple parameters for a VNI change simultaneously, the
changes are processed correctly. The changes of interest are the local
tunnel IP address and the tenant VRF to which this VNI is attached. The
former is used to originate type-3 routes as well as set the next hop of
all routes, the latter helps to determine the route targets and VNIs to
include in the route.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Ticket: CM-19099
Reviewed By: CCR-7102
Testing Done:
1. Manually reproduced problem and verified fix.
2. Additional trigger events tested with fix.
We need a better error message. "Multiple BGP processes are configured"
doesnt makes sense anymore as with l3vni,
we could have multiple auto configured bgp instances.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
EVPN type-2 and type-5 routes received with a L3 VNI and corresponding RTs
are installed into the appropriate BGP RIB. Ensure that these routes are not
re-injected back into EVPN as type-5 routes when type-5 advertisement is
enabled; only regular IPv4 routes (and IPv6 routes in future) in the RIB
should be injected into EVPN.
As a benefit of this change, no longer restrict that EVPN type-5 routes
should be non-host routes - i.e., allow /32 IPv4 routes (and /128 IPv6
routes in future).
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Ticket: CM-19456
Reviewed By: CCR-7117
Testing Done:
1. Manual replication of problem and verification of fix
2. evpn-min
Modify mpls.h to rename MPLS_LABEL_ILLEGAL to be MPLS_LABEL_NONE.
Fix all pre-existing code that used MPLS_LABEL_ILLEGAL.
Modify the zapi vrf label message to use MPLS_LABEL_NONE as the
signal to remove label associated with a vrf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Turns out we had 3 different ways to define labels
all of them overlapping with the same meanings.
Consolidate to 1. This one choosen is consistent
naming wise with what the *bsd and linux kernels
use.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
While processing the type-2 evpn route nlri,
increment the pointer by BGP_LABEL_BYTES instead of '3'
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Create a zapi_nexthop_update_decode function that both
pim and bgp use to decode the message from zebra.
There probably could be further optimizations but I opted
to keep the code as similiar as is possible between the
originals because they both make some assumptions about
code flow that I do not fully understand yet.
The real goal here is that I want to create a new
user of the nexthop tracking code from a higher level
daemon and I see no need to re-implement this damn
code again for a 3rd time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We've run across an issue where the local connected
ip address is not being removed in some error condition.
During trackdown it was noticed that we cannot look
at this table for views/vrf's. While we don't have the
bug tracked down yet this will help us figure it out.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
None of these variables can actually be used before being initialized,
but unfortunately some old compilers are not smart enough to detect that.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Problem reported with output of the command "show bgp vrf all
neighbor x.x.x.x" not limiting the output to that peer in any vrf.
This fix corrects the logic to display by neighbor
(ipv4/ipv6/interface) in any vrf.
Ticket: CM-17377
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
In many situations, it is desirable to only exchange EVPN routes of a particular type.
For e.g., a common deployment scenario for large DCs
is to sub-divide the DC into multiple PODs with full host mobility within a POD
(i.e., all subnets provisioned on all leaf switches within the POD)
but only do prefix-based routing across PODs.
This can be achieved by only exchanging EVPN type-5 routes across PODs.
Implement a policy to filter EVPN routes based on route type.
Ticket: CM-19394
Review: CCR-7139
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
The bestpath multipath-relax setting was added to the output of
"show ip bgp neighbor json" several months ago but this is not
the correct place to display that information and this fix removes
it from there. The multipath-relax setting was also added
to the output of "show ip bgp sum json" which is fine.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
If a peer already has keepalives turned on when asking to turn them on,
return immediately. Same thing for turning them off.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Use the new threading facilities provided in lib/ to streamline the
threads used in bgpd. In particular, all of the lifecycle code has been
removed from the I/O thread and replaced with the default loop. Did not
do the same to the keepalives thread as it is much smaller (doesn't need
the event system).
Also cleaned up some comments to match the style guide.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Ensure that spurious error messages are not generated in a non-EVPN configuration
when routes in a VRF get deleted or added. Also, check on EVPN advertisement
being enabled before walking VRF routing table.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-19206
Reviewed By: CCR-7073
Testing Done:
1. Recreate errors and validate fix
2. Type-5 route related testing - new routes, neighbor flap etc.
When a EVPN type-5 route is formed by using the source IP route's AS-path,
the AS-path is not locally generated and should not be "uninterned" (i.e.,
have its reference count incorrectly updated). An incorrect update of the
reference count can lead to asserts or crashes at a later stage.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-19121
Reviewed By: CCR-7028
Testing Done:
1. Manual testing by Vivek and Anitha
2. No automation run since this area has no coverage yet
When an IPv4 or IPv6 unicast route is injected into EVPN as a type-5 route
(upon user configuration), ensure that the source route (best path)'s path
attributes are used to build the EVPN type-5 route. This will result in
correct AS_PATH and ORIGIN attributes for the type-5 route so that it doesn't
appear that all type-5 routes are locally sourced. This is necessary to
ensure that external paths (IPv4 or IPv6 from EBGP peer) are preferred over
internal EVPN paths, if both exist.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-19051
Reviewed By: CCR-7009
Testing Done: Verify failed scenario
We bail in the command if no l2vnis are present.
This is incorrect as we now print both l2 and l3 vnis together.
Ticket: CM-19022
Review: Trivial
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
0. move all global EVPN details to 'show evpn [json]' command
1. change "VRF" to "Tenant VRF" in 'show evpn vni'
2. change 'show vrf vni' command to tabular form
and add l3-vni related params to the output
3. show evpn rmac should show refcount only in detailed output
4. show evpn next-hop should show refcount only in detailed output
5. move VRF in 'show evpn l3vni' to the end
6. add num rmacs and num nexthops to show evpn l3vni
7. remove "info" from 'show bgp vrf <> l3vni info'
8. show evpn vni <vni> should show l2vni details or l3 vni details
9. show evpn vni should show both L2 and L3 VNIs
10. show bgp l2vpn evpn - shows all global bgp l2vpn evpn details
11. show bgp l2vpn evpn vni - will show both l2 and l3 vnis
12. show bgp l2vpn evpn vni - should show both l2 and l3 vnis
13. follow camel notation for all json keys
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
In EVPN symmetric routing, not all subnets are presents everywhere.
We have multiple scenarios where a host might not get learned locally.
1. GARP miss
2. SVI down/up
3. Silent host
We need a mechanism to resolve such hosts. In order to achieve this,
we will be advertising a subnet route from a box and that box will help
in resolving the ARP to such hosts.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When doing symmetric routing,
EVPN type-2 (MACIP) routes need to be advertised with two labels (VNIs)
the first being the L2 VNI (identifying the VLAN) and
the second being the L3 VNI (identifying the VRF).
The receive processing needs to handle one or two labels too.
Ticket: CM-18489
Review: CCR-6949
Testing: manual and bgp/evpn/mpls smoke
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
1. Added default gw extended community
2. code modification to handle sticky-mac/default-gw-mac as they go together
3. show command support for newly added extended community
4. State in zebra to reflect if a mac/neigh is default gateway
5. show command enhancement to refelect the same in zebra commands
Ticket: CM-17428
Review: CCR-6580
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
If a BGP message header fails validation we send a BGP NOTIFICATION from
the I/O thread. At this time we clear the output buffer, push a
NOTIFICATION and then call the manual write function for errors. But in
between the push and the write the main thread could have pushed some
other message. Thus we need to hold the lock for the duration of the
function. TOCTTOU.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Abstract the code that sends the zapi message into zebra
for the turn on/off of nexthop tracking for a prefix.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The VRF_DEFAULT parameter is incorrectly used. The 0 value for the bgp
instance is passed instead.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
fixup bgpd: fix compilation issue with bgpd
Adds ability to specify that peers should be administratively shutdown
when first configured.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Remove the ability to attempt to configure a couple of options on
directly connected neighbors that don't make sense for them, as well as
the soft error handling code.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When a peer configured with administrative shutdown is added to a peer
group, the administrative shutdown status is discarded and the peer will
enter the BGP FSM. This is not what we want. Preserve the flag instead.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The constant to limit # of allowed cli tokens on any one line was
defined in multiple places, all inconsistent with each other. Fix.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Problem reported that when "systemctl restart networking" was
performed, prefixes previously redistributed into bgp from connected
were deleted from the bgp table. Determined that we were not correctly
changing the redistribution bitmask when the vrf_id of the vrf was
changed. This patch corrects that behavior.
Manual tests look good. bgp-min and vrf-min completed with no new failures.
Ticket: CM-19369
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we have configured neighbor 1.1.1.1 for an afi/safi but they have not
activated that afi/safi with us then display "NoNeg" in the state column
of the summary output. This is to make troubleshooting afi/safi
easier.
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Condition needs to be set inside critical section, otherwise i/o thread
can deadlock. Also unlock mutex once finished with it, no need to hold
the lock for the life of the program.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The BGP IO thread must be running before other threads
can start using it. So at startup check to see
that it running once, instead of before every
function call into.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
-Werror=sign-compare is failing with signed/unsigned usage
in the conditional expression.
Fixes: #1593
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The multithreading code has a comment that reads:
"XXX: Heavy abuse of stream API. This needs a ring buffer."
This patch makes the relevant code use a ring buffer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The bgpTimerUp value was incorrectly named, add
a correct name bgpTimerUpMsec and add some
code to allow for deprecation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For some reason bgp is calculating the peer uptime
in miliseconds incorrectly. Additionally we have
the peer_uptime function call that should be doing this!
But since we've choosen different names for the json output
we cannot fix it at this point.
uptime contains the number of seconds of uptime here. Just
multiply by 1k and display that( as peer_uptime does )
Fixes: #1585
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problems reported with inconsistent use of parameters for bgp network
statements. Converted 12 DEFUNs to 2 DEFPY statements, making the
parameter use consistent with the exception of keeping the "backdoor"
keywork ipv4 only. Also verified that if a route-map or label-index
is specified in the "no" case it matches what had been previously
defined. Manual testing looks good and bgp-smoke will be performed
before pushing.
Ticket: CM-16860
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-7056
The json code was freeing json_paths and then
turning around and free'ing it again. Newer
versions of json-c have started to assert
this bad behavior.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
- used @sharpd's slack patch as a starting point
- fixes compile time issue, but code path not tested
Signed-off-by: Raymond P. Burkholder <github@oneunified.net>
Was using 0 as a sentinel value, so user couldn't configure 0 as the
value of the coalesce timer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
EVPN is only enabled when user configures advertise-all-vni.
All VNIs (L2 and L3) should be cleared upon removal of this config.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When we receive an MP_UNREACH,
we try to uninstall routes from the VRF and the VNI.
The route-node in the VRF corresponds
to the ip prefix formed from EVPN prefix.
We should correctly form the prefix based on the EVPN route-type.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
for EVPN routes prefixlen filed in struct prefix
represents the sizeof of the struct rather than the actual prefix len.
This is later used in looking up route node in RIB.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
CLI config for enabling/disabling type-5 routes
router bgp <as> vrf <vrf>
address-family l2vpn evpn
[no] advertise <ipv4|ipv6|both>
loop through all the routes in VRF instance and advertise/withdraw
all ip routes as type-5 routes in default instance.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
For EVPN type-5 route the NH in the NLRI is set to the local tunnel ip.
This information has to be obtained from kernel notification.
We need to pass this info from zebra to bgp in l3vni call flow.
This patch doesn't handle the tunnel-ip change.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
1. VRF RD can be auto-derived (simillar to RD for a VNI)
2. VRF RD can be configured manually through a config
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
currently, we have a rd_id bitfield
to assign an unique index for auto RD.
This bitfield currently resides under struct bgp which seems wrong.
We need to shift this to a global space
as this ID space is really global per box.
One more reason to keep it at a global data structure is,
the ID space could be used by both VNIs and VRFs.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
BGP VRF can be created/deleted either via config or via l3vni add/del.
We need to handle various sequences.
1. If user config is presented, an l3vni del should not delete the vrf instance
2. do not write bgp config in show running for auto created vrf
2. If l3vni present, disallow the cli for deleting bgp vrf instance
3. If l3vni is added and vrf config is present set the flags properly
4. if bgp vrf is configured unset the AUTO flag
Ticket: CM-18630
Review: CCR-6906
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Currently, kernel doesn't support ipv6 routes with ipv4 nexthops.
To avoid the crash,
we will only attach l3-vni related
RTs/ecommunities only to ipv4 host routes.
Ticket: CM-18743
Review: ccr-6912
Testing: Manaul
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Handle the return value of argv_find_and_parse_afi() to avoid passing
along bad values.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
This is a possible buffer overflow.
We should always use the buffer size (whenever possible) to tell
functions what the size of the buffer is, instead of a hardcoded value.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When bgp has a metric butt-load of routes w/ ecmp, this command
can take an inordinate amount of time to run and complete via
vtysh.
Converting the bgp route output in this case back to not
using the json pretty-print drops ~2 minutes of runtime
off.
It is assumed that if users would like pretty output they
can run it through an appropriate tool via a pipe command.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The prefix_rd2str uses a buffer size of RD_ADDRSTRLEN.
Modify the code to consistently use this for all of BGP.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Make prefix_rd2str return an "Unknown" string when something
goes wrong. This will allow for simplification of the
code that uses prefix_rd2str.
Additionally ensure that size is big enough with an assert.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ignoring the return from argv_find_and_parse_afi
makes the SA system assume that you could pass a AFI_MAX
value to bgp_show_route. Which in turn would cause
an array out of bounds read.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
PREFIX_STRLEN is the correct length for buffers needed to output
a prefix2str. Additionally cleanup some setting of the last
value to a `\0` this is handled by prefix2str.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Since coalesce time is now heuristically adjusted based on peer count,
we need to separate out specific configuration by the user from the
current value. Behavior established is to not adjust if the user has a
value set.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If we ever turn off assertion for production builds
this code as written will cause a crash in that
the assignment will not happen.
Modify the code such that this erroneous assumption
cannot happen.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some of the deprecated stream.h macros see such little use that we may
as well just remove them and use the non-deprecated macros.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
With the way things are set up, this bit of code would never actually
cause a deadlock, but would be highly likely in the future.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If we are in OpenSent or OpenConfirm peer state and we receive a new
address-family activation, we would end up ignoring the new activation
and not tell our peer about it. You could notice this by seeing
the fact that a 'show bgp neighbor' command returns a 'Not in
any update group' for a particular family.
This modifies the code such that we now notice that we are in
either OpenSent or OpenConfirm state and reset the peer to
allow us to send them the new capability.
Ticket: CM-19021
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
After a batch of generated UPDATEs, call bgp_writes_on() once instead of
after generating each packet.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The subgroup coalesce timer controls how long updates to a particular
subgroup are delayed in order to allow additional peers to join the
subgroup. Presently the timer value is 200 ms. Increase it to 1 second
and adjust up as peers are configured, with an upper cap at 10s.
This cuts convergence time by a factor of 3 at large scale (300+ peers,
1000+ prefixes per peer).
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This is necessary because otherwise between the time we wipe the output
buffer and the time we push the NOTIFY onto it, the KA generation thread
could have pushed a KEEPALIVE in the middle.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
In the same vein as the round-robin input commit, this re-adds logic for
limiting the amount of time spent generating UPDATEs per generation
cycle. Missed this when shifting around wpkt_quanta; prior to MT it
limited both calls to write() as well as UPDATE generation.
Unfortunately, batching input processing severely impacts BGP initial
convergence times. As a consequence of the way update-groups were
implemented, advancing the state of the routing table based on prefixes
learned from one peer prior to all (or at least most) peers establishing
connections will cause us to start generating outbound UPDATEs, which is
a very expensive operation at present. This intensive processing starves
out bgp_accept(), delaying connection of additional peers. When
additional peers do connect the problem gets worse and worse, yielding
approximately exponential growth in convergence time dependent on both
peering and prefix counts. This behavior is present pre-multithreading
as well, but batched input exacerbates it.
Round-robin input processing marginally harms convergence times for
small topologies but should allow much larger topologies to function
within reasonable performance thresholds.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Different places scheduling the same thread should use the same
semantics and thread type. Additionally providing the back reference
here makes sure we only schedule the job once and avoids flooding the
event queue with jobs to process an empty buffer.
Apparently I didn't fully understand how subgroup packets make their way
out to individual peers. Turns out (on the base branch) we just busy
poll while waiting for packets to make their way onto subgroup queues.
While this needs to be fixed in the future, for now readding this logic
fixes performance issues with convergence.
Instead of checking whether the post-write number of updates sent was
greater than the pre-write number of updates sent, it was comparing post
to zero. In effect this meant every time we wrote a packet it was
counted as an update for route advertisement timer purposes.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
During initial session establishment, bgpd performs a "connection
transfer" to a new peer struct if the connection was initiated passively
(i.e. by the remote peer). With the addition of buffered input and a
reorganized packet processor, the following race condition manifests:
1. Remote peer initiates a connection. After exchanging OPEN messages,
we send them a KEEPALIVE. They send us a KEEPALIVE followed by
10,000 UPDATE messages. The I/O thread pushes these onto our local
peer's input buffer and schedules a packet processing job on the
main thread.
2. The packet job runs and processes the KEEPALIVE, which completes the
handshake on our end. As part of transferring to ESTABLISHED we
transfer all peer state to a new struct, as mentioned. Upon returning
from the KEEPALIVE processing routing, the peer context we had has
now been destroyed. We notice this and stop processing. Meanwhile
10k UPDATE messages are sitting on the input buffer.
3. N seconds later, the remote peer sends us a KEEPALIVE. The I/O thread
schedules another process job, which finds 10k UPDATEs waiting for
it. Convergence is achieved, but has been delayed by the value of the
KEEPALIVE timer.
The racey part is that if the remote peer takes a little bit of time to
send UPDATEs after KEEPALIVEs -- somewhere on the order of a few hundred
milliseconds -- we complete the transfer successfully and the packet
processing job is scheduled on the new peer upon arrival of the UPDATE
messages. Yuck.
The solution is to schedule a packet processing job on the new peer
struct after transferring state.
Lengthy commit message in case someone has to debug similar problems in
the future...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
During initial session establishment, bgpd performs a "connection
transfer" to a new peer struct if the connection was initiated passively
(i.e. by the remote peer). With the addition of buffered input, I forgot
to transfer the raw input buffer to the new peer. This resulted in
infrequent failures during session handshaking whereby half of a packet
would be thrown away in the middle of a read causing us to send a NOTIFY
for an unsynchronized header. Usually the transfer coincided with a
clean input buffer, hence why it only showed up once in a while.
At some point when rearranging FSM code, bgpd lost the ability to
perform active opens because it was only paying attention to POLLIN and
not POLLOUT, when the latter is used to signify a successful connection
in the active case.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Use best-performing memory orders where appropriate.
Also update some style and add missing comments.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Per previous work to ensure all FSM state is updated after processing
each message, read-quanta should be safe to set > 1.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Slightly incorrect trigger for generating update group packets. In order
to match semantics of previous bgp_write() we need to trigger
update-group packet generation after every write operation, even if no
packets were written. Of course if we're tearing down the session we can
still skip this operation.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Removed in earlier version where the I/O pthread busy-waited for packets
to be posted to an output queue. Now that it's poll()-based, it's
necessary once again. Although this time we can say what we're actually
doing instead of a side effect of a write job.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Keepalive on/off calls are necessary in certain cases due to screwy
fsm flow not turning them on after transferring a passive peer
connection in peer_xfer_conn
* Missed a case bgp_event_update() that resulted in a return code of -1
instead of BGP_Stop, which confuses the packet processing routine
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Despaghettification of bgp_packet.c and bgp_fsm.c
Sometimes we call bgp_event_update() inline packet parsing.
Sometimes we post events instead.
Sometimes we increment packet counters in the FSM.
Sometimes we do it in packet routines.
Sometimes we update EOR's in FSM.
Sometimes we do it in packet routines.
Fix the madness.
bgp_process_packet() is now the centralized place to:
- Update message counters
- Execute FSM events in response to incoming packets
FSM events are now executed directly from this function instead of being
queued on the thread_master. This is to ensure that the FSM contains the
proper state after each packet is parsed. Otherwise there could be race
conditions where two packets are parsed in succession without the
appropriate FSM update in between, leading to session closure due to
receiving inappropriate messages for the current FSM state.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When terminating I/O thread, just schedule an event to do any necessary
cleanup and gracefully exit instead of using a signal.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Start bit flags at 1, not 2
* Make run-flags atomic for i/o thread
* Remove work_cond mutex, it should no longer be necessary
* Add asserts to ensure proper ordering in bgp_connect()
* Use true/false with booleans, not 1/0
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
bgpd supports setting a write-quanta that serves as a hint on how many
packets to write per I/O cycle. Now that input is buffered, it makes
sense to add the equivalent parameter for how many packets are processed
per cycle. This is *not* how many packets are read off the wire per I/O
cycle; rather it is how many packets are processed from the input buffer
in a given cycle after having been read off the wire and sanitized.
Since these values must be used from multiple threads, they have also
been made atomic.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Instead of reading a packet header and the rest of the packet in two
separate i/o cycles, instead read a chunk of data at one time and then
parse as many packets as possible out of the chunk.
Also changes bgp_packet.c to batch process packets.
To avoid thrashing on useless mutex locks, the scheduling call for
bgp_process_packet has been changed to always succeed at the cost of no
longer being cancel-able. In this case this is acceptable; following the
pattern of other event-based callbacks, an additional check in
bgp_process_packet to ignore stray events is sufficient. Before deleting
the peer all events are cleared which provides the requisite ordering.
XXX: chunk hardcoded to 5, should use something similar to wpkt_quanta
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Move and modify all network input related code to bgp_io.c
* Add a real input buffer to `struct peer`
* Move connection initialization to its own thread.c task instead of
piggybacking off of bgp_read()
* Tons of little fixups
Primary changes are in bgp_packet.[ch], bgp_io.[ch], bgp_fsm.[ch].
Changes made elsewhere are almost exclusively refactoring peer->ibuf to
peer->curr since peer->ibuf is now the true FIFO packet input buffer
while peer->curr represents the packet currently being processed by the
main pthread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
After implement threading, bgp_packet.c was serving the double purpose
of consolidating packet parsing functionality and handling actual I/O
operations. This is somewhat messy and difficult to understand. I've
thus moved all code and data structures for handling threaded packet
writes to bgp_io.[ch].
Although bgp_io.[ch] only handles writes at the moment to keep the noise
on this commit series down, for organization purposes, it's probably
best to move bgp_read() and its trappings into here as well and
restructure that code so that read()'s happen in the pthread and packet
processing happens on the main thread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
On TCP connection failure during session setup, bgp_stop() checks
whether peer->t_read is non-null to know whether or not to unschedule
select() on peer->fd before calling close() on it. Using the API exposed
by thread.c instead of bgpd's wrapper macro BGP_READ_ON() results in
this thread value never being set, which causes bgp_stop() to skip the
cancellation of select() before calling close(). Subsequent calls to
select() on that fd crash the daemon.
Use the macro instead.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
During transition from OpenConfirm -> Established, we wipe the peer stub's
output buffer. Because thread.c prioritizes I/O operations over regular
background threads and events, in a single threaded environment this ordering
meant that the output buffer would be happily empty at wipe time. In MT-land,
this convenient coincidence is no longer true; thus we need to make sure that
any packets remaining on the peer stub get transferred over to the peer proper.
Also removes misleading comment indicating that bgp_establish() sends a
keepalive packet. It does not.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If write() indicates that we should retry, just move along to the next
peer and come back later. No need to burn write() in a loop.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Changes all synchronization primitives to be dynamically allocated. This
should help catch any subtle errors in pthread lifecycles.
This change also pre-initializes synchronization primitives before
threads begin to run, eliminating a potential race condition that
probably would have caused a segfault on startup on a very fast box.
Also changes mutex and condition variable allocations to use
MTYPE_PTHREAD and updates tests to do the proper initializations.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Remove t_write
* Remove t_keepalive
These have been replaced by pthreads and are no longer needed. Since
some code looks at these values to determine if the threads are
scheduled, also add a new bitfield to store the same information.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Removes the WiP shim and implements proper thread lifecycle management.
* Declare necessary pthread_t's in bgp_master
* Define new MTYPE in lib/thread.c for pthreads
* Allocate and free BGP's pthreads appropriately
* Terminate and join threads appropriately
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This patch, in tandem with moving packet writes into a dedicated kernel
thread, fixes session flaps caused by long-running internal operations
starving the (old) userspace write thread.
BGP keepalives are now produced by a kernel thread and placed onto the
peer's output queue. These are then consumed by the write thread. Both
of these tasks are concurrent with the rest of bgpd, obviating the
session flaps described above.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Prior to this change, after initiating a nonblocking connection to the
remote peer bgpd would call both BGP_READ_ON and BGP_WRITE_ON on the
peer's socket. This resulted in a call to select(), so that when some
event (either a connection success or failure) occurred on the socket,
one of bgp_read() or bgp_write() would run. At the beginning of each of
those functions was a hook into bgp_connect_check(), which checked the
socket status and issued the correct connection event onto the BGP FSM.
This code is better suited for bgp_fsm.c. Placing it there avoids
scheduling packet reads or writes when we don't know if the socket has
established a connection yet, and the specific functionality is a better
fit for the responsibility scope of this unit.
This change also helps isolate the responsibilities of the
packet-writing kernel thread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Prior to this change, packets generated for update groups were taken off
of the (independent) buffer for the update group, reformatted for the
specific peer under question and sent off inline with bgp_write(). Since
the operations of this code path can include the merging and pruning of
subgroups and are too large to safely synchronize, this change moves
that logic to execute after each tick of the write thread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Allow the higher level protocol to specify if it would
like to receive notifications about it's routes that
it has installed.
I've purposely made it part of zclient_new_notify because
we need to track the routes on a per daemon basis only.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
So we have the ability to apply speculative route-maps to
neighbor display to see what the changes would look like
via some show commands. When we do this we make a
shallow copy of the attr data structure and then pass
it around for applying the routemap. After we've applied
this route-map and displayed it we really need to clean
up memory that the route-map application applied.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are applying an experimental route-map type
to a peer( as part of a show command ) save and
restore the peer's ->rmap_type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we display the advertised-routes and we don't have a route-map
to apply do not re-apply the route-map. This does two things:
1) Fixes a display issue with the show command.
2) More importantly stops leaking memory like a sieve for when
you have a full bgp table.
Fixes: #1345
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit fixes a bug where json output would display
',,,,,,,' because we were deciding to not display information
about some routes due to a selection criteria.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Building a communities json object every time is
both expensive and memory wasteful. Modify
code to only build the json object when needed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The creation of the json object for the aspath
is both memory intensive and expensive to
create. Only create the json object when
it is needed and stash it for further usage
at that point.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When BGP is being redistributed prefixes, allow it to
understand the nexthop type.
This fixes the issue where a blackhole route was being interpreted
to having a nexthop of 1.0.0.0( ruh-roh!!! ). This broke
downstream neighbors as that they would receive a 1.0.0.0 nexthop,
which is bad, very very bad.
This commit sets us up for the future where we can match
a route-map against a nexthop type. In that bgp is
now at least nominally paying attention to the type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The usage of XMALLOC for route_match_peer_compile causes
the pc->interface to be non-NULL. The code assumes that
pc->interface will be NULL.
Ticket: CM-18824
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fixes a bug whereby all peer-groups would be shown even when a
particular peer-group was specified for display.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The flags value is not used for unregister events. Let's purposefully
not send anything and purposefully not accept non 0 for it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit adds support for the RTR protocol to receive ROA
information from a RPKI cache server. That information can than be used
to validate the BGP origin AS of IP prefixes.
Both features are implemented using [rtrlib](http://rtrlib.realmv6.org/).
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
A crafted BGP UPDATE with a malformed path attribute length field causes
bgpd to dump up to 65535 bytes of application memory and send it as the
data field in a BGP NOTIFY message, which is truncated to 4075 bytes
after accounting for protocol headers. After reading a malformed length
field, a NOTIFY is generated that is supposed to contain the problematic
data, but the malformed length field is inadvertently used to compute
how much data we send.
CVE-2017-15865
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If the user has configured the ability to override
the capabilities or if the afi/safi passed as part
of the _MP capability is not understood, then we
can enter into an infinite loop as part of the
capability parsing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are displaying a extended community ECOMMUNITY_SITE_ORIGIN
the display sprintf is this:
len = sprintf(
str_buf + str_pnt,
"EVPN:%02x:%02x:%02x:%02x:%02x:%02x",
macaddr[0], macaddr[1], macaddr[2],
macaddr[3], macaddr[4], macaddr[5]);
The problem with this is that macaddr[0] is passed in as a integer
so the sprintf function thinks that the value to display is much
larger than it actually is. The ECOMMUNITY_STR_DEFAULT_LEN is 27
So the resulting string no-longer fits in memory and we write
off the end of the buffer and can crash. If we force the
passed in value to be a uint8_t then we get the expected output
since a single byte is displayed as 2 hex characters and the
resulting string fits in str_buf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that a receiver of a default route issued across bgp
unnumbered peering using default originate would have the route stay
as inactive. Discovered we were messing up the nexthop value sent to
the peer in this one particular case. Manual testing good, fix supplied
to the submitter and verified to resolve the problem. bgp-smoke
completed successfully.
Ticket: CM-18634
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we fail to bind to port 179 we are left in a situation
where we have not saved the bgp pointer created and when
the bgp cli mode is exited we leak the memory.
Additionally there is no recovery situation here that
could be easily programmed without fundamentally changing
the code.
So let's exit and output to the log file some useful
information to hopefully clue the user in on what is
going wrong.
Fixes: #1130
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that we weren't adjusting the keepalive timer
correctly when we negotiated a lower hold time learned from a
peer. While working on this, found we didn't do inheritance
correctly at all. This fix solves the first problem and also
ensures that the timers are configured correctly based on this
priority order - peer defined > peer-group defined > global config.
This fix also displays the timers as "configured" regardless of
which of the three locations above is used.
Ticket: CM-18408
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-6807
Testing-performed: Manual testing successful, fix tested by
submitter, bgp-smoke completed successfully
This issue was discovered on a live session with an extremely
old cisco 7206VXR router running 12.2(33)SRE4. The sending router
is sending us an empty NLRI that is MP_REACH. From RFC
exploration(thanks Russ!) it appears that this was
considered a 'valid' way to send EOR.
Following discussion decided that we should treat
this situation as a EOR marker instead of bringing
down the session.
Applying this fix on the FRR router seeing this issue
allows it to continue it's peering relationship with
the ASR. Since this is a point fix I do not see
a high likelihood of further fallout.
Fixes: #1258
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When bgp is coming up and is reading a non-integrated config.
The bgp connection to zebra has not fully had a chance to start.
As such when a redistribute line is parsed the attempt is
made to install it but it was erroring out with a warning.
This caused the `redistribute XXX` line to create a error
message to the end user.
Since bgp calls zclient_send_reg_requests which re-registers
the redistribute call once the actual zebra connection is up
and once bgp comes alive this is ok.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There are multiple places that we use route-maps in bgp
There is no need to limit the route-map 'match peer ...' command
to just import and export route-map types. I see need for
using this in table-maps as well.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ignore the return value of some functions in the places we know they
can't fail, and other small fixes.
Regarding the change in bgpd/rfapi/rfapi_rib.c, asserting that
rfapiRaddr2Qprefix() didn't fail is the common idiom inside the rfapi
code.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
These are mostly trivial fixes for leaks in the error path of some functions.
The changes in bgpd/bgp_mpath.c deserves a bit of explanation though. In
the bgp_info_mpath_aggregate_update() function, we were allocating memory
for the lcomm variable but doing nothing with it. Since the code for
communities, extended communities and large communities is pretty much
the same in this function, it's clear that this was a copy and paste
error where most of the ext. community code was copied but not all of
it as it should have been.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
In some situations we already know the ifp and by extension
the ifindex there is no need to look it up for every
route we send to zebra.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Handle better stress situations when multiple peers are trying to
connect at the same time by bumping the TCP connection backlog limit.
This reduces the convergence time of BGPerf stress test.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
If upon bgp startup we have this config:
router bgp 64540
neighbor 192.168.201.134 remote-as external
!
address-family ipv4 unicast
no neighbor 192.168.201.134 activate
neighbor 192.168.201.134 route-map NEXTHOP in
exit-address-family
The route-map map pointer for the incoming(or outgoing)
filter was not being saved due to a pre-mature optimization
of not handling the routemap callback if the peer is not
activated. The function that handles the peers route-maps
is making sure that the peer is in established state
before attempting to actually apply anything so just
call it to set the map pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>