Current code in bgp bestpath selection would accept the newest
locally originated path as the best path. Making the selection
non-deterministic. Modify the code to always come to the
same bestpath conclusion when you have multiple locally originated
paths in bestpath selection.
Before:
eva# conf
eva(config)# router bgp 323
eva(config-router)# address-family ipv4 uni
eva(config-router-af)# redistribute connected
eva(config-router-af)# network 192.168.161.0/24
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:02:52 2020
eva(config-router-af)# no redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)# redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #2, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:03:32 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)#
Notice the route choosen depends on order received
Fixed behavior:
eva# conf
eva(config)# router bgp 323
eva(config-router)# address-family ipv4 uni
eva(config-router-af)# redistribute connected
eva(config-router-af)# network 192.168.161.0/24
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:02:52 2020
eva(config-router-af)# no redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)# redistribute connected
eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0
BGP routing table entry for 192.168.161.0/24
Paths: (2 available, best #2, table default)
Not advertised to any peer
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin incomplete, metric 0, weight 32768, valid, sourced
Last update: Wed Sep 16 15:03:32 2020
Local
0.0.0.0(eva) from 0.0.0.0 (192.168.161.245)
Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin)
Last update: Wed Sep 16 15:03:03 2020
eva(config-router-af)#
Ticket: CM-31490
Found-by: Trey Aspelund <taspelund@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add code to handle MED matching:
- When MED matches act as normal.
- When MED doesn't match do the following:
* Uninstall the aggregate route
* Unsuppress routes (if using summary-only)
- When MED didn't match, but now matches:
* Install the aggregate route
* Suppress all routes (if using summary-only)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
For `no router bgp` without ASN check candidate
config for default bgp instance presence to avoid
failure from checking backend db where bgp instance
may not be created.
This situation can be seen in transactional cli mode
with following config.
bharat(config)# router bgp 101
bharat(config-router)# exit
bharat(config)# no router bgp
% No BGP process is configured
bharat(config)# no router bgp
% No BGP process is configured
bharat(config)#
Signed-off-by: Chirag Shah <chirag@nvidia.com>
move `router bgp` nb callback at `bgp` node level
to have access to bgp context at neighbor and peer-group
level and align create/destroy callbacks call during
no router bgp.
Earlier `no router bgp` is performed first global destroy
callback is called which essentially removes `bgp context`
then it calls to remove (parallel nodes) neighbor and peer-group
which does not have access to bgp context.
Moving router bgp at bgp solves this destroy callback ordering issue.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Omit routing protocol augment name from callbacks name.
(Omitted: routing_control_plane_protocols_control_plane_protocol_)
Signed-off-by: Chirag Shah <chirag@nvidia.com>
On bgpd bootstrap register routing hook which ensures
only single bgp named instance created per vrf routing
hierarchy.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
This commit contains splitting of auto-generated bgp northbound callbacks
into separate files.
Include the files into bgp makefile.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
This should never happen; no need to debug guard it and it's not a
warning, if this isn't working then NHT is not working at all.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
If you are including a network statement of a /32
then the current bgp martian checks will match the /32
together.
Problem:
!
router bgp 3235
neighbor 192.168.161.2 remote-as external
neighbor 192.168.161.131 remote-as external
!
address-family ipv4 unicast
network 10.10.3.11/32
network 192.168.161.0/24
no neighbor 192.168.161.2 activate
neighbor 192.168.161.2 route-map BLUE in
exit-address-family
!
eva# show bgp ipv4 uni
BGP table version is 1, local router ID is 10.10.3.11, vrf id 0
Default local pref 100, local AS 3235
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
10.10.3.11/32 0.0.0.0(eva) 0 32768 i
*> 192.168.161.0/24 0.0.0.0(eva) 0 32768 i
Displayed 2 routes and 2 total paths
eva# show bgp import-check-table
Current BGP import check cache:
192.168.161.0 valid [IGP metric 0], #paths 1
if enp39s0
Last update: Fri Sep 25 08:00:42 2020
10.10.3.11 valid [IGP metric 0], #paths 1
if lo
Last update: Fri Sep 25 08:00:42 2020
eva# show bgp ipv4 uni summ
BGP router identifier 10.10.3.11, local AS number 3235 vrf-id 0
BGP table version 1
RIB entries 3, using 576 bytes of memory
Peers 1, using 21 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
janelle(192.168.161.131) 4 60000 69 70 0 0 0 00:03:21 0 1
Total number of neighbors 1
When we are deciding that a nexthop is valid there is not much point in checking
that a static route has a martian nexthop or not, since we self derived it already.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
bgp->default_keepalive was not considered when setting
peer->v_keepalive, causing the effective keepalive interval to
always be (holdtime/3), even when default_keepalive < (holdtime/3).
This ensures that the default_keepalive is used when it's set and
is < (holdtime/3).
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
(cherry picked from commit d8bf8c6128f2e493d473148213bd663a500c7f73)
Problem found that if a router-id was not defined or derived
initially, the bgp->router_id would be set to 0x0 and used
for determining auto-rd values. When bgp received a subsequent
router-id update from zebra, bgp would not completely process
the update since it was treated as updating an already derived
router-id with a new value, which is not desired. This also
could leave the auto rd/rt inforamation missing or invalid in
some cases. This fix allows updating the derived router-id if
the previous value was 0/0.
Ticket: CM-31441
Signed-off-by: Don Slice <dslice@nvidia.com>
The pbra variable is already derefed in all paths to this spot
and as such we cannot be NULL at this point.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When doing multiplication of (int) * (uint_8t) we can
have overflow and end up in a weird state. Intentionally
upgrade the type then do the math.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The Solaris code has gone through a deprecation cycle. No-one
has said anything to us and worse of all we don't have any test
systems running Solaris to know if we are making changes that
are breaking on Solaris. Remove it from the system so
we can clean up a bit.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add support for a BGP-wide setting to enter and exit graceful shutdown.
This will apply to all BGP peers across all BGP instances. Per-instance
configuration is disallowed if the BGP-wide setting is in effect.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
This function is poorly named; it's really used to allow the FSM to
decide the next valid state based on whether a peer has valid /
reachable nexthops as determined by NHT or BFD.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
The tip hash is only used when we are dealing with
evpn. In bgp_nexthop_self we are doing a memset
irrelevant of whether we will ever find data. Yes
hash_lookup will return pretty quickly.
Modify the code to avoid doing a memset in the case
where the tip hash is empty as that we know we'll
never find anything. With full BGP feeds this
small memset does take some time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is needed to avoid mangling update-group which is used for many peers.
Sent prefix count is managed by update-groups.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Problem rerported that if you enter an existing community list
sequence number with new community information, the entire community
list would be deleted. This commit fixes the replace logic to do
the right thing.
Ticket: CM-30555
Signed-off-by: Don Slice <dslice@nvidia.com>
When installing rules pass by the interface name across
zapi.
This is being changed because we have a situation where
if you quickly create/destroy ephermeal interfaces under
linux the upper level protocol may be trying to add
a rule for a interface that does not quite exist
at the moment. Since ip rules actually want the
interface name ( to handle just this sort of situation )
convert over to passing the interface name and storing
it and using it in zebra.
Ticket: CM-31042
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
[no_]neighbor_nexthop_self_cmd & [no_]neighbor_nexthop_self_force_cmd
have duplicate install_element actions on the EVPN_NODE. This causes
duplicate command log errors which are caught by topotests. Remove
these.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
There can be cases where evpn traffic is not meshed across various
endpoints, but sent to a central pe. For this situation, add the
configuration knobs to force nexthop attribute. Upon that change,
nexthop unchanged attribute is automatically disabled.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Enhancement to update-delay configuration to allow setting globally
rather than per-instance. Setting the update-delay is allowed either
per-vrf or globally, but not both at the same time.
Ticket: CM-31096
Signed-off-by: Don Slice <dslice@nvidia.com>
Attribute may not be long enough to contain a localpref value, resulting
in an assert on stream size. Gracefully handle this case instead.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
NLRI parsing for mpls vpn was missing several length checks that could
easily result in garbage heap reads past the end of nlri->packet.
Convert the whole function to use stream APIs for automatic bounds
checking...
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
When using these flag #defines, by default their types are integers but
they are always used in conjunction with unsigned integers, which
introduces some implicit conversions that really ought to be avoided.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
bgp_attr_intern(attr) takes an attribute, duplicates it, and inserts it
into the attribute hash table, returning the inserted attr. This is done
when processing a bgp update. We store the returned attribute in the
path info struct. However, later on we modify one of the fields of the
attribute. This field is inspected by attrhash_cmp, the function that
allows the hash table to select the correct item from the hash chain for
a given key when doing a lookup on an item. By modifying the field after
it's been inserted, we open the possibility that two items in the same
chain that at insertion time were differential by attrhash_cmp becomes
equal according to that function. When performing subsequent hash
lookups, it is then indeterminate which of the equivalent items the hash
table will select from the chain (in practice it is the first one but
this may not be the one we want). Thus, it is illegal to modify
data used by a hash comparison function after inserting that data into
a hash table.
In fact this is occurring for attributes. We insert two attributes that
hash to the same key and thus end up in the same hash chain. Then we
modify one of them such that the two items now compare equal. Later one
we want to release the second item from the chain before XFREE()'ing it,
but since the two items compare equal we get the first item back, then
free the second one, which constitutes two bugs, the first being the
wrong attribute removed from the hash table and the second being a
dangling pointer stored in the hash table.
To rectify this we need to perform any modifications to an attr before
it is inserted into the table, i.e., before calling bgp_attr_intern().
This patch does that by moving the sole modification to the attr that
occurs after the insert (that I have seen) before that call.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
* Added vtysh cli commands and functions to set/unset bgp daemons no-rib
option during runtime and withdraw/announce routes in bgp instances
RIB from/to Zebra.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
The bgpTrapBackwardTransition callback was being called only during
bgp_stop and only under the condition that peer status was Established.
The MIB defines that the event should be generated for every transition
of the BGP FSM from a higher to a lower state.
Signed-off-by: Babis Chalios <mail@bchalios.io>
When deleting a dynamic peer, unsetting md5 password would cause
it to be unset on the listener allowing unauthenticated connections
from any peer in the range.
Check for dynamic peers in peer delete and avoid this.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
When setting authentication on a BGP peer in a VRF the listener is
looked up from a global list. However there is no check that the
listener is the one associated with the VRF being configured. This
can result in the wrong listener beiong configured with a password,
leaving the intended listener in an open authentication state.
To simplify this lookup stash a pointer to the bgp instance in
the listener on creating (in the same way as is done for NS-based
VRFS).
Signed-off-by: Pat Ruddy <pat@voltanet.io>
since the addition of srte_color to the comparison for bgp nexthops
it is possible to have several nexthops per prefix but since zebra
only sores a per prefix registration we should not unregister for
nh notifications for a prefix unti all the nexthops for that prefix
have been deleted. Otherwise we can get into a deadlock situation
where BGP thinks we have registered but we have unregistered from zebra.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Extend the NHT code so that only the affected BGP routes are affected
whenever an SR-policy is updated on zebra.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Example configuration:
route-map SET_SR_POLICY permit 10
set sr-te color 1
!
router bgp 1
bgp router-id 1.1.1.1
neighbor 2.2.2.2 remote-as 1
neighbor 2.2.2.2 update-source lo
address-family ipv4 unicast
neighbor 2.2.2.2 next-hop-self
neighbor 2.2.2.2 route-map SET_SR_POLICY in
exit-address-family
!
!
Learned BGP routes from 2.2.2.2 are mapped to the SR-TE Policy
which is uniquely determined by the BGP nexthop (2.2.2.2 in this
case) and the SR-TE color in the route-map.
Co-authored-by: Renato Westphal <renato@opensourcerouting.org>
Co-authored-by: GalaxyGorilla <sascha@netdef.org>
Co-authored-by: Sebastien Merle <sebastien@netdef.org>
Signed-off-by: Sebastien Merle <sebastien@netdef.org>
Fist, routing tables aren't the most appropriate data structure
to store nexthops and imported routes since we don't need to do
longest prefix matches with that information.
Second, by converting the NHT code to use rb-trees, we can index
the nexthops using additional information, not only the destination
address. This will be useful later to index bgpd's nexthops by
both destination and SR-TE color.
Co-authored-by: Sebastien Merle <sebastien@netdef.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
json = NULL; is set in a loop above and here we are trying to check and
free the object again which is never be reached.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
If you configure eBGP on loopbacks, you might miss setting the
ebgp-multihop option. Given that, the session will not be established
because of this. Now, the session is in Active state. When you update
your config afterwards and set the ebgp-multihop option to the
appropriate value, the session will still be in Active state. In fact,
it will be stuck in Active state and only services restart will help.
With this change, when set the ebgp-multihop option and no session was
established, reset the session.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
If you advertise a default route (via default-originate) only if some
prefix is present in the BGP RIB (route-map specified) and this prefix
becomes unavailable, the default route keeps being advertised.
With this change, when we iterate over the BGP RIB to check if we can
advertise the default route, skip unavailable prefixes.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
* Reverted back to using an ALIAS definition for the negated bgp
shutdown command with a concatenated message string.
* Unified cli command descriptions for bgp shutdown commands.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Changed command description string to use "Remove" instead of
"Disable" to prevent user confusion due to double negation.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Added a "no bgp shutdown message MSG..." cli command for ease of use
with copy/paste. Because of current limitations with DEFPY/ALIAS and
the message string concatenation, a new command instead of an ALIAS
had to be implemented.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
This will check route-maps as well, not only prefix-lists, access-lists, and
filter-lists.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
a dereference of null pointer exists in current flowspec code, with
prefix pointer. check validity of pointer before going ahead.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
because ecommunity structure can host both ext community and ipv6 ext
community, do not forget to set the unit_size field.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
because the same extended community can be used for storing ipv6 and
ipv4 et communities, the unit length must be stored. do not forget to
set the standard value in bgp evpn.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
if match protocol is icmp, then this protocol will be filtered with afi
= ipv4. however, if afi = ipv6, then the icmp protocol will fall back to
icmpv6.
note that this patch has also been done to simplify the policy routing,
as BGP will only handle TCP/UDP/ICMP(v4 or v6) protocols.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the following 3 options are not supported in current implementation of
policy routing. for that, inform the user that the flowspec entry is
invalid when attempting to use :
- prefix offset with src, or dst ipv6 address ( see [1])
- flowlabel value - limitation due to [0]
- fragment ( implementation not done today).
[0] https://bugzilla.netfilter.org/show_bug.cgi?id=1375
[1] https://bugzilla.netfilter.org/show_bug.cgi?id=1373
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in addition to ipv4 flowspec, ipv6 flowspec address family can configure
its own list of interfaces to monitor. this permits filtering the policy
routing only on some interfaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rfc 5701 is supported. it is possible to configure in bgp vpn, a list of
route target with ipv6 external communities to import. it is to be noted
that this ipv6 external community has been developed only for matching a
bgp flowspec update with same ipv6 ext commmunity.
adding to this, draft-ietf-idr-flow-spec-v6-09 is implemented regarding
the redirect ipv6 option.
Practically, under bgp vpn, under ipv6 unicast, it is possible to
configure : [no] rt6 redirect import <IPV6>:<AS> values.
An incoming bgp update with fs ipv6 and that option matching a bgp vrf,
will be imported in that bgp vrf.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in order to create appropriate policy route, family attribute is stored
in ipset and iptable zapi contexts. This commit also adds the flow label
attribute in iptables, for further usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this commit supports [0] where ipv6 address is encoded in nexthop
attribute of nlri, and not in bgp redirect ip extended community. the
community contains only duplicate information or not.
Adding to this, because an action or a rule needs to apply to either
ipv4 or ipv6 flow, modify some internal structures so as to be aware of
which flow needs to be filtered. This work is needed when an ipv6
flowspec rule without ip addresses is mentioned, we need to know which
afi is served. Also, this work will be useful when doing redirect VRF.
[0] draft-simpson-idr-flowspec-redirect-02.txt
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in ipv6 flowspec, a new type is defined to be able to do filtering rules
based on 20 bits flow label field as depicted in [0]. The change include
the decoding by flowspec, and the addition of a new attribute in policy
routing rule, so that the data is ready to be sent to zebra.
The commit also includes a check on fragment option, since dont fragment
bit does not exist in ipv6, the value should always be set to 0,
otherwise the flowspec rule becomes invalid.
[0] https://tools.ietf.org/html/draft-ietf-idr-flow-spec-v6-09
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>