==916511== 18 bytes in 2 blocks are definitely lost in loss record 7 of 147
==916511== at 0x483877F: malloc (vg_replace_malloc.c:307)
==916511== by 0x4BE0F0A: strdup (strdup.c:42)
==916511== by 0x48D66CE: qstrdup (memory.c:122)
==916511== by 0x1E6E31: bgp_vpn_leak_export (bgp_mplsvpn.c:2690)
==916511== by 0x28E892: bgp_router_create (bgp_nb_config.c:124)
==916511== by 0x48E05AB: nb_callback_create (northbound.c:869)
==916511== by 0x48E0FA2: nb_callback_configuration (northbound.c:1183)
==916511== by 0x48E13D0: nb_transaction_process (northbound.c:1308)
==916511== by 0x48E0137: nb_candidate_commit_apply (northbound.c:741)
==916511== by 0x48E024B: nb_candidate_commit (northbound.c:773)
==916511== by 0x48E6B21: nb_cli_classic_commit (northbound_cli.c:64)
==916511== by 0x48E757E: nb_cli_apply_changes (northbound_cli.c:281)
Signed-off-by: Chirag Shah <chirag@nvidia.com>
json_array_string_add is used to add a string entry into a JSON
list. This API is needed by zebra so moving it from bgpd to lib.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
DF (Designated forwarder) election is used for picking a single
BUM-traffic forwarded per-ES. RFC7432 specifies a mechanism called
service carving for DF election. However that mechanism has many
disadvantages -
1. LBs poorly.
2. Doesn't allow for a controlled failover needed in upgrade
scenarios.
3. Not easy to hw accelerate.
To fix the poor performance of service carving alternate DF mechanisms
have been proposed via the following drafts -
draft-ietf-bess-evpn-df-election-framework
draft-ietf-bess-evpn-pref-df
This commit adds support for the pref-df election mechanism which
is used as the default. Other mechanisms including service-carving
may be added later.
In this mechanism one switch on an ES is elected as DF based on the
preference value; higher preference wins with IP address acting
as the tie-breaker (lower-IP wins if pref value is the same).
Sample output
=============
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es 03:00:00:00:00:01:11:00:00:01
ESI: 03:00:00:00:00:01:11:00:00:01
Type: LR
RD: 27.0.0.15:6
Originator-IP: 27.0.0.15
Local ES DF preference: 100
VNI Count: 10
Remote VNI Count: 10
Inconsistent VNI VTEP Count: 0
Inconsistencies: -
VTEPs:
27.0.0.16 flags: EA df_alg: preference df_pref: 32767
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn route esi 03:00:00:00:00:01:11:00:00:01
*> [4]:[03:00:00:00:00:01:11:00:00:01]:[32]:[27.0.0.15]
27.0.0.15 32768 i
ET:8 ES-Import-Rt:00:00:00:00:01:11 DF: (alg: 2, pref: 100)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Convert IPv4 and IPv6 unicast address family clis
to transactional clis and implementation of
northbound callbacks.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
When compiling w/ --enable-bfdd=no we get warnings
about functions not being used.
Add a #if check to include it as needed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Problem found that turning an update-delay would only delay prefixes
learned from peers by delaying bestpath, but would allow local routes
(network statements or redistributed) to be immediately advertised,
followed by an End of Rib indicator. This fix delays sending local
routes until the update-delay process is completed, which matches
what testing shows other vendors do..
Ticket: CM-31743
Signed-off-by: Don Slice <dslice@nvidia.com>
- tracepoint() -> frrtrace()
- tracelog() -> frrtracelog()
- tracepoint_enabled() -> frrtrace_enabled()
Also removes copypasta'd #ifdefs for those LTTng macros, those are
handled in lib/trace.h
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Replace all lib/thread cancel macros, use thread_cancel()
everywhere. Only the THREAD_OFF macro and thread_cancel() api are
supported. Also adjust thread_cancel_async() to NULL caller's pointer (if
present).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Change thread_cancel to take a ** to an event, NULL-check
before dereferencing, and NULL the caller's pointer. Update
many callers to use the new signature.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Instead of just counting the route suppressions, keep a reference for
all aggregations that are doing it. It should help the with the
following problems:
- Which aggregation suppressed the route.
- Double suppression
- Double unsuppression
- Avoids calling `bgp_process` if already suppressed/unsuppressed.
- Easier code maintenance and understanding
This also fixes a crash when modifying a route map that is
associated with a working aggregate-address.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add new aggregate-address option to selectively suppress routes based
on route map results.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
We currently have a global process queue for handling route
updates in bgp. This is fine, in general, except there are
places and times where we plug the queue for no new work
during certain peer states of bgp update delay. If we
happen to be processing multiple bgp instances on startup
why do we want to stop processing in vrf A when vrf B
is in a bit of a pickle?
Also this separation will allow us to start forward thinking
about how to fully integrate pthreads into route processing
in bgp.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The MH datastructures were being released before the paths that were
referencing them. Fix is to do the MH cleanup last.
The MH finish function has also been stripped down to only do a
datastructure cleanup i.e. avoid sending route updates etc.
Ticket: 31376
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Create appropriate accessor functions for the rn->lock
data. We should be accessing this data through accessor
functions since it is private data to the data structure.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Some more of the bgp_node usage snuck in from big commits in
the past month or so from feature work. Do some work
to put it back to bgp_dest for incoming future work.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add numberical evpn route type support to some more
show commands.
Also, simplify some of the code there to call common type parsing
function. Some of the bounds checking there is also unncessary given
how our cli node matching works.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In bgp global commands northbound local-as modify callback
check for backend db for checking existing bgp instance.
In an instance where no router bgp with old ASN cleaned up
followed by new bgp instance with new AS is created,
the nb_running_get_entry in validation phase returns stale
bgp reference, which leads to rejection of the router bgp command.
Uncovered via:
toptotest evpn_type5_test_topo1/test_evpn_type5_topo1.py
test_bgp_attributes_for_evpn_address_family_p1
Signed-off-by: Chirag Shah <chirag@nvidia.com>
bgp_show_neighbor_route() was rewriting safi from LU to uni
before checking if the peer was enabled for LU. This resulted
in the peer's address-family check looking for unicast, which
would always fail for LU peers since unicast + LU are
mutually-exclusive AFIs.
This moves this safi reassignment after the peer AFI check,
ensuring that the peer's address-family check looks for LU
while the call to bgp_show() still uses uni.
-- highlights from manual testing
config:
router bgp 2
neighbor 1.1.1.1 remote-as external
neighbor 1.1.1.1 disable-connected-check
neighbor 1.1.1.1 update-source 2.2.2.2
!
address-family ipv4 unicast
no neighbor 1.1.1.1 activate
exit-address-family
!
address-family ipv4 labeled-unicast
neighbor 1.1.1.1 activate
exit-address-family
before:
spine01# show bgp ipv4 unicast neighbors 1.1.1.1 routes
% No such neighbor or address family
spine01# show bgp ipv4 labeled-unicast neighbors 1.1.1.1 routes
% No such neighbor or address family
after:
spine01# show bgp ipv4 unicast neighbors 1.1.1.1 routes
% No such neighbor or address family
spine01# show bgp ipv4 label neighbors 1.1.1.1 routes
BGP table version is 1, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 11.11.11.11/32 1.1.1.1 0 0 1 i
Displayed 1 routes and 1 total paths
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
if (pcout > (pcount * peer->max_threshold[afi][safi] / 100 ))
is always true. So the very first route received will always
trigger the warning. We actually want the warning to happen
when we hit the threshold.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We have this pattern in the code base:
if (thread)
THREAD_OFF(thread);
If we look at THREAD_OFF we check to see if thread
is non-null too. So we have a double check.
This is unnecessary. Convert to just using THREAD_OFF
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Current code in bgp bestpath selection would accept the newest
locally originated path as the best path. Making the selection
non-deterministic. Modify the code to always come to the
same bestpath conclusion when you have multiple locally originated
paths in bestpath selection.
Before:
eva# conf
eva(config)# router bgp 323
eva(config-router)# address-family ipv4 uni
eva(config-router-af)# redistribute connected
eva(config-router-af)# network 192.168.161.0/24
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:02:52 2020
eva(config-router-af)# no redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)# redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #2, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:03:32 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)#
Notice the route choosen depends on order received
Fixed behavior:
eva# conf
eva(config)# router bgp 323
eva(config-router)# address-family ipv4 uni
eva(config-router-af)# redistribute connected
eva(config-router-af)# network 192.168.161.0/24
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:02:52 2020
eva(config-router-af)# no redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)# redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #2, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:03:32 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)#
Ticket: CM-31490
Found-by: Trey Aspelund <taspelund@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When we enter `router bgp` it enters non-VRF instance which is default.
No need to check for VRF/VIEW name, kinda dead code.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Current code in bgp bestpath selection would accept the newest
locally originated path as the best path. Making the selection
non-deterministic. Modify the code to always come to the
same bestpath conclusion when you have multiple locally originated
paths in bestpath selection.
Before:
eva# conf
eva(config)# router bgp 323
eva(config-router)# address-family ipv4 uni
eva(config-router-af)# redistribute connected
eva(config-router-af)# network 192.168.161.0/24
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:02:52 2020
eva(config-router-af)# no redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)# redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #2, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:03:32 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)#
Notice the route choosen depends on order received
Fixed behavior:
eva# conf
eva(config)# router bgp 323
eva(config-router)# address-family ipv4 uni
eva(config-router-af)# redistribute connected
eva(config-router-af)# network 192.168.161.0/24
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:02:52 2020
eva(config-router-af)# no redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)# redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #2, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:03:32 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)#
Ticket: CM-31490
Found-by: Trey Aspelund <taspelund@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add code to handle MED matching:
- When MED matches act as normal.
- When MED doesn't match do the following:
* Uninstall the aggregate route
* Unsuppress routes (if using summary-only)
- When MED didn't match, but now matches:
* Install the aggregate route
* Suppress all routes (if using summary-only)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
For `no router bgp` without ASN check candidate
config for default bgp instance presence to avoid
failure from checking backend db where bgp instance
may not be created.
This situation can be seen in transactional cli mode
with following config.
bharat(config)# router bgp 101
bharat(config-router)# exit
bharat(config)# no router bgp
% No BGP process is configured
bharat(config)# no router bgp
% No BGP process is configured
bharat(config)#
Signed-off-by: Chirag Shah <chirag@nvidia.com>
move `router bgp` nb callback at `bgp` node level
to have access to bgp context at neighbor and peer-group
level and align create/destroy callbacks call during
no router bgp.
Earlier `no router bgp` is performed first global destroy
callback is called which essentially removes `bgp context`
then it calls to remove (parallel nodes) neighbor and peer-group
which does not have access to bgp context.
Moving router bgp at bgp solves this destroy callback ordering issue.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Omit routing protocol augment name from callbacks name.
(Omitted: routing_control_plane_protocols_control_plane_protocol_)
Signed-off-by: Chirag Shah <chirag@nvidia.com>
On bgpd bootstrap register routing hook which ensures
only single bgp named instance created per vrf routing
hierarchy.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
This commit contains splitting of auto-generated bgp northbound callbacks
into separate files.
Include the files into bgp makefile.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
This should never happen; no need to debug guard it and it's not a
warning, if this isn't working then NHT is not working at all.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
If you are including a network statement of a /32
then the current bgp martian checks will match the /32
together.
Problem:
!
router bgp 3235
neighbor 192.168.161.2 remote-as external
neighbor 192.168.161.131 remote-as external
!
address-family ipv4 unicast
network 10.10.3.11/32
network 192.168.161.0/24
no neighbor 192.168.161.2 activate
neighbor 192.168.161.2 route-map BLUE in
exit-address-family
!
eva# show bgp ipv4 uni
BGP table version is 1, local router ID is 10.10.3.11, vrf id 0
Default local pref 100, local AS 3235
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
10.10.3.11/32 0.0.0.0(eva) 0 32768 i
*> 192.168.161.0/24 0.0.0.0(eva) 0 32768 i
Displayed 2 routes and 2 total paths
eva# show bgp import-check-table
Current BGP import check cache:
192.168.161.0 valid [IGP metric 0], #paths 1
if enp39s0
Last update: Fri Sep 25 08:00:42 2020
10.10.3.11 valid [IGP metric 0], #paths 1
if lo
Last update: Fri Sep 25 08:00:42 2020
eva# show bgp ipv4 uni summ
BGP router identifier 10.10.3.11, local AS number 3235 vrf-id 0
BGP table version 1
RIB entries 3, using 576 bytes of memory
Peers 1, using 21 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
janelle(192.168.161.131) 4 60000 69 70 0 0 0 00:03:21 0 1
Total number of neighbors 1
When we are deciding that a nexthop is valid there is not much point in checking
that a static route has a martian nexthop or not, since we self derived it already.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
bgp->default_keepalive was not considered when setting
peer->v_keepalive, causing the effective keepalive interval to
always be (holdtime/3), even when default_keepalive < (holdtime/3).
This ensures that the default_keepalive is used when it's set and
is < (holdtime/3).
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
(cherry picked from commit d8bf8c6128f2e493d473148213bd663a500c7f73)
Problem found that if a router-id was not defined or derived
initially, the bgp->router_id would be set to 0x0 and used
for determining auto-rd values. When bgp received a subsequent
router-id update from zebra, bgp would not completely process
the update since it was treated as updating an already derived
router-id with a new value, which is not desired. This also
could leave the auto rd/rt inforamation missing or invalid in
some cases. This fix allows updating the derived router-id if
the previous value was 0/0.
Ticket: CM-31441
Signed-off-by: Don Slice <dslice@nvidia.com>
The pbra variable is already derefed in all paths to this spot
and as such we cannot be NULL at this point.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When doing multiplication of (int) * (uint_8t) we can
have overflow and end up in a weird state. Intentionally
upgrade the type then do the math.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The Solaris code has gone through a deprecation cycle. No-one
has said anything to us and worse of all we don't have any test
systems running Solaris to know if we are making changes that
are breaking on Solaris. Remove it from the system so
we can clean up a bit.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add support for a BGP-wide setting to enter and exit graceful shutdown.
This will apply to all BGP peers across all BGP instances. Per-instance
configuration is disallowed if the BGP-wide setting is in effect.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
This function is poorly named; it's really used to allow the FSM to
decide the next valid state based on whether a peer has valid /
reachable nexthops as determined by NHT or BFD.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
The tip hash is only used when we are dealing with
evpn. In bgp_nexthop_self we are doing a memset
irrelevant of whether we will ever find data. Yes
hash_lookup will return pretty quickly.
Modify the code to avoid doing a memset in the case
where the tip hash is empty as that we know we'll
never find anything. With full BGP feeds this
small memset does take some time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is needed to avoid mangling update-group which is used for many peers.
Sent prefix count is managed by update-groups.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Problem rerported that if you enter an existing community list
sequence number with new community information, the entire community
list would be deleted. This commit fixes the replace logic to do
the right thing.
Ticket: CM-30555
Signed-off-by: Don Slice <dslice@nvidia.com>
When installing rules pass by the interface name across
zapi.
This is being changed because we have a situation where
if you quickly create/destroy ephermeal interfaces under
linux the upper level protocol may be trying to add
a rule for a interface that does not quite exist
at the moment. Since ip rules actually want the
interface name ( to handle just this sort of situation )
convert over to passing the interface name and storing
it and using it in zebra.
Ticket: CM-31042
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
[no_]neighbor_nexthop_self_cmd & [no_]neighbor_nexthop_self_force_cmd
have duplicate install_element actions on the EVPN_NODE. This causes
duplicate command log errors which are caught by topotests. Remove
these.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
There can be cases where evpn traffic is not meshed across various
endpoints, but sent to a central pe. For this situation, add the
configuration knobs to force nexthop attribute. Upon that change,
nexthop unchanged attribute is automatically disabled.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Enhancement to update-delay configuration to allow setting globally
rather than per-instance. Setting the update-delay is allowed either
per-vrf or globally, but not both at the same time.
Ticket: CM-31096
Signed-off-by: Don Slice <dslice@nvidia.com>
Attribute may not be long enough to contain a localpref value, resulting
in an assert on stream size. Gracefully handle this case instead.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
NLRI parsing for mpls vpn was missing several length checks that could
easily result in garbage heap reads past the end of nlri->packet.
Convert the whole function to use stream APIs for automatic bounds
checking...
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
When using these flag #defines, by default their types are integers but
they are always used in conjunction with unsigned integers, which
introduces some implicit conversions that really ought to be avoided.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
bgp_attr_intern(attr) takes an attribute, duplicates it, and inserts it
into the attribute hash table, returning the inserted attr. This is done
when processing a bgp update. We store the returned attribute in the
path info struct. However, later on we modify one of the fields of the
attribute. This field is inspected by attrhash_cmp, the function that
allows the hash table to select the correct item from the hash chain for
a given key when doing a lookup on an item. By modifying the field after
it's been inserted, we open the possibility that two items in the same
chain that at insertion time were differential by attrhash_cmp becomes
equal according to that function. When performing subsequent hash
lookups, it is then indeterminate which of the equivalent items the hash
table will select from the chain (in practice it is the first one but
this may not be the one we want). Thus, it is illegal to modify
data used by a hash comparison function after inserting that data into
a hash table.
In fact this is occurring for attributes. We insert two attributes that
hash to the same key and thus end up in the same hash chain. Then we
modify one of them such that the two items now compare equal. Later one
we want to release the second item from the chain before XFREE()'ing it,
but since the two items compare equal we get the first item back, then
free the second one, which constitutes two bugs, the first being the
wrong attribute removed from the hash table and the second being a
dangling pointer stored in the hash table.
To rectify this we need to perform any modifications to an attr before
it is inserted into the table, i.e., before calling bgp_attr_intern().
This patch does that by moving the sole modification to the attr that
occurs after the insert (that I have seen) before that call.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
* Added vtysh cli commands and functions to set/unset bgp daemons no-rib
option during runtime and withdraw/announce routes in bgp instances
RIB from/to Zebra.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
The bgpTrapBackwardTransition callback was being called only during
bgp_stop and only under the condition that peer status was Established.
The MIB defines that the event should be generated for every transition
of the BGP FSM from a higher to a lower state.
Signed-off-by: Babis Chalios <mail@bchalios.io>
When deleting a dynamic peer, unsetting md5 password would cause
it to be unset on the listener allowing unauthenticated connections
from any peer in the range.
Check for dynamic peers in peer delete and avoid this.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
When setting authentication on a BGP peer in a VRF the listener is
looked up from a global list. However there is no check that the
listener is the one associated with the VRF being configured. This
can result in the wrong listener beiong configured with a password,
leaving the intended listener in an open authentication state.
To simplify this lookup stash a pointer to the bgp instance in
the listener on creating (in the same way as is done for NS-based
VRFS).
Signed-off-by: Pat Ruddy <pat@voltanet.io>
since the addition of srte_color to the comparison for bgp nexthops
it is possible to have several nexthops per prefix but since zebra
only sores a per prefix registration we should not unregister for
nh notifications for a prefix unti all the nexthops for that prefix
have been deleted. Otherwise we can get into a deadlock situation
where BGP thinks we have registered but we have unregistered from zebra.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Extend the NHT code so that only the affected BGP routes are affected
whenever an SR-policy is updated on zebra.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Example configuration:
route-map SET_SR_POLICY permit 10
set sr-te color 1
!
router bgp 1
bgp router-id 1.1.1.1
neighbor 2.2.2.2 remote-as 1
neighbor 2.2.2.2 update-source lo
address-family ipv4 unicast
neighbor 2.2.2.2 next-hop-self
neighbor 2.2.2.2 route-map SET_SR_POLICY in
exit-address-family
!
!
Learned BGP routes from 2.2.2.2 are mapped to the SR-TE Policy
which is uniquely determined by the BGP nexthop (2.2.2.2 in this
case) and the SR-TE color in the route-map.
Co-authored-by: Renato Westphal <renato@opensourcerouting.org>
Co-authored-by: GalaxyGorilla <sascha@netdef.org>
Co-authored-by: Sebastien Merle <sebastien@netdef.org>
Signed-off-by: Sebastien Merle <sebastien@netdef.org>
Fist, routing tables aren't the most appropriate data structure
to store nexthops and imported routes since we don't need to do
longest prefix matches with that information.
Second, by converting the NHT code to use rb-trees, we can index
the nexthops using additional information, not only the destination
address. This will be useful later to index bgpd's nexthops by
both destination and SR-TE color.
Co-authored-by: Sebastien Merle <sebastien@netdef.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
json = NULL; is set in a loop above and here we are trying to check and
free the object again which is never be reached.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
If you configure eBGP on loopbacks, you might miss setting the
ebgp-multihop option. Given that, the session will not be established
because of this. Now, the session is in Active state. When you update
your config afterwards and set the ebgp-multihop option to the
appropriate value, the session will still be in Active state. In fact,
it will be stuck in Active state and only services restart will help.
With this change, when set the ebgp-multihop option and no session was
established, reset the session.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
If you advertise a default route (via default-originate) only if some
prefix is present in the BGP RIB (route-map specified) and this prefix
becomes unavailable, the default route keeps being advertised.
With this change, when we iterate over the BGP RIB to check if we can
advertise the default route, skip unavailable prefixes.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
* Reverted back to using an ALIAS definition for the negated bgp
shutdown command with a concatenated message string.
* Unified cli command descriptions for bgp shutdown commands.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Changed command description string to use "Remove" instead of
"Disable" to prevent user confusion due to double negation.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Added a "no bgp shutdown message MSG..." cli command for ease of use
with copy/paste. Because of current limitations with DEFPY/ALIAS and
the message string concatenation, a new command instead of an ALIAS
had to be implemented.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
This will check route-maps as well, not only prefix-lists, access-lists, and
filter-lists.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
a dereference of null pointer exists in current flowspec code, with
prefix pointer. check validity of pointer before going ahead.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
because ecommunity structure can host both ext community and ipv6 ext
community, do not forget to set the unit_size field.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
because the same extended community can be used for storing ipv6 and
ipv4 et communities, the unit length must be stored. do not forget to
set the standard value in bgp evpn.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
if match protocol is icmp, then this protocol will be filtered with afi
= ipv4. however, if afi = ipv6, then the icmp protocol will fall back to
icmpv6.
note that this patch has also been done to simplify the policy routing,
as BGP will only handle TCP/UDP/ICMP(v4 or v6) protocols.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the following 3 options are not supported in current implementation of
policy routing. for that, inform the user that the flowspec entry is
invalid when attempting to use :
- prefix offset with src, or dst ipv6 address ( see [1])
- flowlabel value - limitation due to [0]
- fragment ( implementation not done today).
[0] https://bugzilla.netfilter.org/show_bug.cgi?id=1375
[1] https://bugzilla.netfilter.org/show_bug.cgi?id=1373
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in addition to ipv4 flowspec, ipv6 flowspec address family can configure
its own list of interfaces to monitor. this permits filtering the policy
routing only on some interfaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rfc 5701 is supported. it is possible to configure in bgp vpn, a list of
route target with ipv6 external communities to import. it is to be noted
that this ipv6 external community has been developed only for matching a
bgp flowspec update with same ipv6 ext commmunity.
adding to this, draft-ietf-idr-flow-spec-v6-09 is implemented regarding
the redirect ipv6 option.
Practically, under bgp vpn, under ipv6 unicast, it is possible to
configure : [no] rt6 redirect import <IPV6>:<AS> values.
An incoming bgp update with fs ipv6 and that option matching a bgp vrf,
will be imported in that bgp vrf.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in order to create appropriate policy route, family attribute is stored
in ipset and iptable zapi contexts. This commit also adds the flow label
attribute in iptables, for further usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this commit supports [0] where ipv6 address is encoded in nexthop
attribute of nlri, and not in bgp redirect ip extended community. the
community contains only duplicate information or not.
Adding to this, because an action or a rule needs to apply to either
ipv4 or ipv6 flow, modify some internal structures so as to be aware of
which flow needs to be filtered. This work is needed when an ipv6
flowspec rule without ip addresses is mentioned, we need to know which
afi is served. Also, this work will be useful when doing redirect VRF.
[0] draft-simpson-idr-flowspec-redirect-02.txt
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in ipv6 flowspec, a new type is defined to be able to do filtering rules
based on 20 bits flow label field as depicted in [0]. The change include
the decoding by flowspec, and the addition of a new attribute in policy
routing rule, so that the data is ready to be sent to zebra.
The commit also includes a check on fragment option, since dont fragment
bit does not exist in ipv6, the value should always be set to 0,
otherwise the flowspec rule becomes invalid.
[0] https://tools.ietf.org/html/draft-ietf-idr-flow-spec-v6-09
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
as per [0], ipv6 adress format introduces an ipv6 offset that needs to
be extracted too. The change include the validation, decoding for
further usage with policy-routing and decoding for dumping.
[0] https://tools.ietf.org/html/draft-ietf-idr-flow-spec-v6-09
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
until now, the assumption was done in bgp flowspec code that the
information contained was an ipv4 flowspec prefix. now that it is
possible to handle ipv4 or ipv6 flowspec prefixes, that information is
stored in prefix_flowspec attribute. Also, some unlocking is done in
order to process ipv4 and ipv6 flowspec entries.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Connect for X succeeds and hence moved from idle ->connect with
FD-x.
4. A incoming connection is accepted and a new peer datastructure Y
is created with FD-y moves from idle->Active state.
5. Peer datastercture Y FD-y sends out OPEN and moves to
Active->Opensent state.
6. Peer datastrcture Y FD-y receives OPEN and moved from Opensent->
Openconfirm state.
7. Meanwhile on peer datastrcture X FD-x sends out a OPEN message
and moved from connect->Opensent.
8. For peer datastrcture Y FD-y keep alive is received and it is
moved from OpenConfirm->Established.
9. In this case peer datastructure Y FD-y is a accepted connection
so we try to copy all its parameter to peer datastructure X and
delete Y.
10. During this process TCP connection for the accepted connection
(FD-y) goes down and hence get remote address and port fails.
11. With this failure bgp_stop function for both peer datastrure X
and peer datastructure Y is called.
12. By this time all the parameters include state for datastrcture
for X and Y are exchanged. Peer Y FD-y when it entered this
function had state OpenConfirm still which has been moved to peer
datastrcture X.
13. In bgp_stop it will stop all the timers and take action only if
peer is in established state. Now that peer datastrcture X and Y
are not in established state (in this function) it will simply
close all timers and close the socket and assigns socket for both
the peer datastrcture to -1.
14. Peer datastrcture Y will be deleted as it is a datastrcture created
due to accept of connection where as peer datastrcture X will be held
as it is created with configuration.
15. Now peer datastrcture X now holds a state of OpenConfirm without any
timers running.
16. With this any new incoming connection will never be able to establish
as there is config connection X which is stuck in OpenConfirm.
Fix:
While transferring the peer datastructure Y FD-y (accepted connection)
to the peer datastructure X, if TCP connection for FD-y goes down, then
1. Call fsm event bgp_stop for X (do cleanup with bgp_stop and move the
state to Idle) and
2. Call fsm event bgp_stop for Y (do cleanup with bgp_stop and gets deleted
since it is an accept connection).
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Peer datastrcture Y FD-X receives OPEN and moved from Opensent->
Openconfirm state and start the hold timer.
4. In the OpenConfirm state, the hold timer is stopped. So peer X
waits for Keepalive message from peer. If the Keepalive message
is not received, then it will be in OpenConfirm state for
indefinite time.
5. Due to this it neither close the existing connection nor it will
accept any connection from peer.
Fix:
In the OpenConfirm state, don't stop the hold timer.
1. Upon receipt of a neighbor’s Keepalive, the state is moved to
Established.
2. But If the hold timer expires, a stop event occurs, the state
is moved to Idle.
This is as per RFC.
Signed-off-by: Sarita Patra <saritap@vmware.com>
* Applied style suggestions by automated compliance check.
* Fixed function bgp_shutdown_enable to use immutable message string.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
When iterating over a `show ip bgp vrf all neighbors json` command
bgp is crashing.
The json variable was being double freed. When freeing it, set it
to NULL and then check to make sure it exists before we free.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The RFC states:
The BGP Identifier is a 4-octet, unsigned, non-zero integer that
should be unique within an AS. The value of the BGP Identifier
for a BGP speaker is determined on startup and is the same for
every local interface and every BGP peer.
We were going slightly beyond this and ensuring that the address
was a specific range of addresses which is no longer relevant.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Replaced alias for bgp shutdown command with separate regular command
to prevent internal CLI errors.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Peers are now automatically restarted by the reconnect timer instead
of a ManualStart event after lifting the administrative shutdown.
* Question of when to log what remains.
* Compiles and works as intended now.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Fixed integration in FSM and packet handling.
* Added CLI "show" output, incl. JSON.
* For review and testing only.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Changes allow administratively shutting down all peers of a BGP
instance.
* New CLI commands "[no] bgp shutdown" in vty shell.
* For review and testing only.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
This would be handy for situations when a notification was sent, but it's
absolutely not clear who triggered that.
Just in case dumping all attributes under the debug mode would help finding
the _bad_ attribute.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
* Removed old timer thread resets, since this has been taken care of
after execution of the threads by the thread_fetch function in
lib/thread.c for quite some time now.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Currently, bgpPeerTable only looks the default BGP instance. Most
vendors return all the available peers in this table. This commit
exposes all BGP instances.
The other tables are unchanged as it doesn't make sense to expose
routes from random VRFs into a single table. Vendors are using SNMP
contexts for that but we don't have support for it. Therefore, do
nothing.
Fix#6077
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
SYNC routes are paths rxed from a local-ES peer. These routes result in
the installation of local dataplane entries i.e. with access port as
destination (vs. the remote-VTEP destination that results in the packet
being sent via the VxLAN overlay).
If a SYNC path is selected as the best path it is always turned around
into a local path which immediately lowers the status of the SYNC path
to non-best. However we need to keep track of the highest MM seq-number
and peer activity to continue advertising the local path. In order to
do that we need information from the "second-best" SYNC path to be
bubbled up to the local best path. This "SYNC" info is then consolidated
and sent to zebra which is responsible for the MM handling and local
path management.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a SYNC route i.e. a route with a local ES as destination is
rxed on a switch (say L11) from an ES peer (say L12) a local
MAC/neigh entry is created on L11 with the local access port
as dest port.
Creation of the local entry triggers a local path advertisement from
L11. This could be a "locally-active" path or a "locally-inactive"
path. Inactive paths are advertised with the proxy bit.
To ensure that the local entry is not deleted by a SYNC route it is
given absolute precedence over peer-paths.
If there are two non-local paths with the same dest ES and same MM
seq number the non-proxy path is preferred. This is done to ensure
that we don't lose track of the peer-activity.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A new proxy flag has been added to the already existing NA extended
community to allow proxy advertisment of a local host by a VTEP that is
yet to indpendently establish local reachability.
Reference: draft-rbickhart-evpn-ip-mac-proxy-adv
The extendend mac-mobility sequence number needs to be synced across
the ES peers. However we cannot let a ES-peer path win over a local
path on the same ES. To accomplish that some parameters such as the
MM seq number are bubbled up from the non-best path to the local path.
This mechanism is explained further in the path-selection patch.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The `struct evpn_ead_addr` structure had a prefix length
associated with it. This value was only ever set never
used. Remove this from our system. The other
nice thing about this change is that it puts back
the sizeof struct route_node to 192 bytes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. Sample ES display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es
ES Flags: L local, R remote, I inconsistent
VTEP Flags: E ESR/Type-4, A active nexthop
ESI Flags RD #VNIs VTEPs
03:00:00:00:00:01:11:00:00:01 LR 27.0.0.15:15 10 27.0.0.16(EA)
03:00:00:00:00:01:22:00:00:02 LR 27.0.0.15:16 10 27.0.0.16(EA)
03:00:00:00:00:01:22:00:00:03 LR 27.0.0.15:17 10 27.0.0.16(EA)
03:00:00:00:00:02:11:00:00:01 R - 10 27.0.0.17(A),27.0.0.18(A)
03:00:00:00:00:02:22:00:00:02 R - 10 27.0.0.17(A),27.0.0.18(A)
03:00:00:00:00:02:22:00:00:03 R - 10 27.0.0.17(A),27.0.0.18(A)
torm-11#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2. Sample ES-EVI display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es-evi
Flags: L local, R remote, I inconsistent
VTEP-Flags: E EAD-per-ES, V EAD-per-EVI
VNI ESI Flags VTEPs
1005 03:00:00:00:00:01:11:00:00:01 LR 27.0.0.16(EV)
1005 03:00:00:00:00:01:22:00:00:02 LR 27.0.0.16(EV)
1005 03:00:00:00:00:01:22:00:00:03 LR 27.0.0.16(EV)
1005 03:00:00:00:00:02:11:00:00:01 R 27.0.0.17(EV),27.0.0.18(EV)
1005 03:00:00:00:00:02:22:00:00:02 R 27.0.0.17(EV),27.0.0.18(EV)
1005 03:00:00:00:00:02:22:00:00:03 R 27.0.0.17(EV),27.0.0.18(EV)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3. Sample EAD route display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn route type ead
BGP table version is 19, local router ID is 27.0.0.15
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [4]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Extended Community
Route Distinguisher: 27.0.0.15:5
*> [1]:[0]:[03:00:00:00:00:01:11:00:00:01]:[128]:[0.0.0.0]
27.0.0.15 32768 i
ET:8 RT:5550:1009
*> [1]:[0]:[03:00:00:00:00:01:22:00:00:02]:[128]:[0.0.0.0]
27.0.0.15 32768 i
ET:8 RT:5550:1009
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is the base patch that brings in support for Type-1 routes.
It includes support for -
- Ethernet Segment (ES) management
- EAD route handling
- MAC-IP (Type-2) routes with a non-zero ESI i.e. Aliasing for
active-active multihoming
- Initial infra for consistency checking. Consistency checking
is a fundamental feature for active-active solutions like MLAG.
We will try to levarage the info in the EAD-ES/EAD-EVI routes to
detect inconsitencies in access config across VTEPs attached to
the same Ethernet Segment.
Functionality Overview -
========================
1. Ethernet segments are created in zebra and associated with
access VLANs. zebra sends that info as ES and ES-EVI objects to BGP.
2. BGP advertises EAD-ES and EAD-EVI routes for the locally attached
ethernet segments.
3. Similarly BGP processes EAD-ES and EAD-EVI routes from peers
and translates them into ES-VTEP objects which are then sent to zebra
as remote ESs.
4. Each ES in zebra is associated with a list of active VTEPs which
is then translated into a L2-NHG (nexthop group). This is the ES
"Alias" entry
5. MAC-IP routes with a non-zero ESI use the alias entry created in
(4.) to forward traffic i.e. a MAC-ECMP is done to these remote-ES
destinations.
EAD route management (route table and key) -
============================================
1. Local EAD-ES routes
a. route-table: per-ES route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
b. route-table: per-VNI route-table
Not added
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
2. Remote EAD-ES routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
3. Local EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
4. Remote EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
Please refer to bgp_evpn_mh.h for info on how the data-structures are
organized.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Add ESI as an inline attribute field along with the other EVPN
attributes. This may be re-worked when the rest of the EVPN
attributes find a new home.
Some cleanup has been done to get rid of stale/unused references
to ESI. And also to consolidate duplicate definitions of ES ID
types.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. EAD routes require support for ESI_LABEL extended community. The
primary info in this EC is a flags the specifies if the ES is
Single-active or active-acive.
2. Also fixed up ES_IMPORT_RT string. Support was added a long time
ago for ESR/Type-4 routes but it has not really been exercised for
MH functionality till now.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Re-org only; no other code changes. This is being done to make maintanence
of MH functionality (which will have more code added to it) easy.
The code moved here was originally committed via -
'commit 50f74cf131 ("*: support for evpn type-4 route")'
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Revert "zebra: support for macvlan interfaces"
This reverts commit bf69e212fd.
Revert "doc: add some documentation about bgp evpn netns support"
This reverts commit 89b97c33d7.
Revert "zebra: dynamically detect vxlan link interfaces in other netns"
This reverts commit de0ebb2540.
Revert "bgpd: sanity check when updating nexthop from bgp to zebra"
This reverts commit ee9633ed87.
Revert "lib, zebra: reuse and adapt ns_list walk functionality"
This reverts commit c4d466c830.
Revert "zebra: local mac entries populated in correct netnamespace"
This reverts commit 4042454891.
Revert "zebra: when parsing local entry against dad, retrieve config"
This reverts commit 3acc394bc5.
Revert "bgpd: evpn nexthop can be changed by default"
This reverts commit a2342a2412.
Revert "zebra: zvni_map_to_vlan() adaptation for all namespaces"
This reverts commit db81d18647.
Revert "zebra: add ns_id attribute to mac structure"
This reverts commit 388d5b438e.
Revert "zebra: bridge layer2 information records ns_id where bridge is"
This reverts commit b5b453a2d6.
Revert "zebra, lib: new API to get absolute netns val from relative netns val"
This reverts commit b6ebab34f6.
Revert "zebra, lib: store relative default ns id in each namespace"
This reverts commit 9d3555e06c.
Revert "zebra, lib: add an internal API to get relative default nsid in other ns"
This reverts commit 97c9e7533b.
Revert "zebra: map vxlan interface to bridge interface with correct ns id"
This reverts commit 7c990878f2.
Revert "zebra: fdb and neighbor table are read for all zns"
This reverts commit f8ed2c5420.
Revert "zebra: zvni_map_to_svi() adaptation for other network namespaces"
This reverts commit 2a9dccb647.
Revert "zebra: display interface slave type"
This reverts commit fc3141393a.
Revert "zebra: zvni_from_svi() adaptation for other network namespaces"
This reverts commit 6fe516bd4b.
Revert "zebra: importation of bgp evpn rt5 from vni with other netns"
This reverts commit 28254125d0.
Revert "lib, zebra: update interface name at netlink creation"
This reverts commit 1f7a68a2ff.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Added a macro to validate the v4 mapped v6 address.
Modified bgp receive & send updates for v4 mapped v6 address as
nexthop and installing it as recursive nexthop in RIB.
Minor change in fpm while sending the routes for nexthop as
v4 mapped v6 address.
Signed-off-by: Kaushik <kaushik@niralnetworks.com>
When we have a prefix that has been selected, note that that
particular flag has been set and give that information to the
end user.
eva# show bgp ipv4 uni neighbors 192.168.161.131 prefix-counts
Prefix counts for 192.168.161.131, IPv4 Unicast
PfxCt: 814246
Counts from RIB table walk:
Adj-in: 0
Damped: 0
Removed: 0
History: 0
Stale: 0
Valid: 814246
All RIB: 814246
PfxCt counted: 814246
PfxCt Best Selected: 0
Useable: 814246
eva# show bgp ipv4 uni neighbors 192.168.161.2 prefix-counts
Prefix counts for 192.168.161.2, IPv4 Unicast
PfxCt: 814070
Counts from RIB table walk:
Adj-in: 0
Damped: 0
Removed: 0
History: 0
Stale: 0
Valid: 814070
All RIB: 814070
PfxCt counted: 814070
PfxCt Best Selected: 814070
Useable: 814070
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Coverity has noticed that we are using bgp_evpn after
we have already NULL checked it one time. Add an assert
to make Coverity happy here, if we get to this point
something terrible has happened.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Coverity rightly points out that bgp_table_top might return
NULL and immediately deref'ing it might be a problem.
Add a bit of safety.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I wanted to preserve the old code flow to see what might
be needed in the future in commit:
23ca3269da
Coverity doesn't like dead code. So let's comment it out.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If _force_ is set, then ALL prefixes are counted for maximum instead of
accepted only. This is useful for cases where an inbound filter is applied,
but you want maximum-prefix to act on ALL (including filtered) prefixes.
For instance, we have a configuration like:
neighbor r1 maximum-prefix 10
neighbor r1 prefix-list custom in
!
ip prefix-list custom seq 1 permit 10.0.0.0/24
ip prefix-list custom seq 2 permit 10.0.1.0/24
This will accept only 2 prefixes and discard all others instead of
shutting down the session when 10 is reached.
With this new knob (force), we will count all received prefixes and shutdown
the session when 10 is reached.
The bigger problem is when you have lots of peers with full feed and such a
configuration like in an example.
This is kinda re-ordering of how to treat filter vs. maximum-prefix.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
While checking my BGP debugging settings at the console, I noticed
this message was missing a newline. Add it to be consistent with the
other similar messages.
Signed-off-by: Russell Bryant <rbryant@redhat.com>
During perf testing of receiving and installing 7.5 million
routes into zebra it was noticed that memset in bgp_zebra_announce
was taking ~11% of the runtime. With this change bgp_zebra_announce
now no longer has any appreciable time spent in memset as reported
by perf. In addition bgp_zebra_announce run time in perf was
reduced by a composite amount.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
```
(gdb) bt
0 0x00007f45a6f0a781 in raise () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007f45a6ef455b in abort () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007f45a7781920 in core_handler (signo=11, siginfo=0x7fffac7b84b0, context=<optimized out>) at lib/sigevent.c:228
3 <signal handler called>
4 0x000055a4133c0f32 in bgp_table_stats (vty=vty@entry=0x55a415acb240, bgp=0x0, afi=AFI_IP, safi=SAFI_UNICAST, json_array=json_array@entry=0x0) at bgpd/bgp_route.c:11412
5 0x000055a4133c13fb in show_ip_bgp_afi_safi_statistics (self=<optimized out>, vty=0x55a415acb240, argc=6, argv=<optimized out>) at bgpd/bgp_route.c:10749
6 0x00007f45a773917d in cmd_execute_command_real (vline=vline@entry=0x55a415ab7e10, vty=vty@entry=0x55a415acb240, cmd=cmd@entry=0x0, filter=FILTER_RELAXED)
at lib/command.c:909
7 0x00007f45a773afdf in cmd_execute_command (vline=vline@entry=0x55a415ab7e10, vty=vty@entry=0x55a415acb240, cmd=0x0, vtysh=vtysh@entry=0) at lib/command.c:968
8 0x00007f45a773b135 in cmd_execute (vty=vty@entry=0x55a415acb240, cmd=cmd@entry=0x55a415ace950 "show ip bgp vrf all statistics", matched=matched@entry=0x0,
vtysh=vtysh@entry=0) at lib/command.c:1122
9 0x00007f45a7794d62 in vty_command (vty=vty@entry=0x55a415acb240, buf=0x55a415ace950 "show ip bgp vrf all statistics") at lib/vty.c:526
10 0x00007f45a7794fb6 in vty_execute (vty=vty@entry=0x55a415acb240) at lib/vty.c:1293
11 0x00007f45a7797804 in vtysh_read (thread=<optimized out>) at lib/vty.c:2126
12 0x00007f45a778f641 in thread_call (thread=thread@entry=0x7fffac7bb040) at lib/thread.c:1550
13 0x00007f45a775b6d8 in frr_run (master=0x55a415542820) at lib/libfrr.c:1098
14 0x000055a4133815d6 in main (argc=10, argv=0x7fffac7bb2a8) at bgpd/bgp_main.c:509
```
"show ip bgp vrf all statistics" should show statistics for all VRFs if "all"
is specified.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
1) When a session comes up for a peer and if the peer has not adverised
the GR capabilities, BGP sends a request to Zebra to clear any
stale routes that might exist from that peer.
2) When OPEN message is received from the peer, clear the previously
advertised GR capability by the peer, if the lastest received
OPEN message does not contain the GR capability.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
It's hard to cope with cases when next-hop is changed/unchanged or
peers are non-direct.
It would be better to show the hostname and nexthop IP address (both)
under `show bgp` to quickly identify the source and the real next-hop
of the route.
If `bgp default show-nexthop-hostname` is toggled the output looks like:
```
spine1-debian-9# show bgp
BGP table version is 1, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* 2a02:4780::/64 fe80::a00:27ff:fe09:f8a3(exit1-debian-9)
0 0 65001 ?
spine1-debian-9# show ip bgp
BGP table version is 5, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.255.255.0/24 192.168.0.1(exit1-debian-9)
0 0 65001 ?
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
BFD profiles can now be used on the interface level like this:
interface eth1
ip router isis 1
isis bfd
isis bfd profile default
Here the 'default' profile needs to be specified as usual in the
bfdd configuration.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>
```
exit1-debian-9# show bgp summary
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
192.168.0.2 4 200 10 6 0 0 0 00:00:35 8 8
2a02:4780::2 4 0 0 1 0 0 0 never Active 0
Total number of neighbors 2
exit1-debian-9# show bgp summary established
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
192.168.0.2 4 200 10 6 0 0 0 00:00:39 8 8
Total number of neighbors 2
exit1-debian-9# show bgp summary failed
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
2a02:4780::2 0 0 never Waiting for peer OPEN
Total number of neighbors 2
exit1-debian-9#
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>