The tip hash is only used when we are dealing with
evpn. In bgp_nexthop_self we are doing a memset
irrelevant of whether we will ever find data. Yes
hash_lookup will return pretty quickly.
Modify the code to avoid doing a memset in the case
where the tip hash is empty as that we know we'll
never find anything. With full BGP feeds this
small memset does take some time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is needed to avoid mangling update-group which is used for many peers.
Sent prefix count is managed by update-groups.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Problem rerported that if you enter an existing community list
sequence number with new community information, the entire community
list would be deleted. This commit fixes the replace logic to do
the right thing.
Ticket: CM-30555
Signed-off-by: Don Slice <dslice@nvidia.com>
When installing rules pass by the interface name across
zapi.
This is being changed because we have a situation where
if you quickly create/destroy ephermeal interfaces under
linux the upper level protocol may be trying to add
a rule for a interface that does not quite exist
at the moment. Since ip rules actually want the
interface name ( to handle just this sort of situation )
convert over to passing the interface name and storing
it and using it in zebra.
Ticket: CM-31042
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
[no_]neighbor_nexthop_self_cmd & [no_]neighbor_nexthop_self_force_cmd
have duplicate install_element actions on the EVPN_NODE. This causes
duplicate command log errors which are caught by topotests. Remove
these.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
There can be cases where evpn traffic is not meshed across various
endpoints, but sent to a central pe. For this situation, add the
configuration knobs to force nexthop attribute. Upon that change,
nexthop unchanged attribute is automatically disabled.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Enhancement to update-delay configuration to allow setting globally
rather than per-instance. Setting the update-delay is allowed either
per-vrf or globally, but not both at the same time.
Ticket: CM-31096
Signed-off-by: Don Slice <dslice@nvidia.com>
Attribute may not be long enough to contain a localpref value, resulting
in an assert on stream size. Gracefully handle this case instead.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
NLRI parsing for mpls vpn was missing several length checks that could
easily result in garbage heap reads past the end of nlri->packet.
Convert the whole function to use stream APIs for automatic bounds
checking...
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
When using these flag #defines, by default their types are integers but
they are always used in conjunction with unsigned integers, which
introduces some implicit conversions that really ought to be avoided.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
bgp_attr_intern(attr) takes an attribute, duplicates it, and inserts it
into the attribute hash table, returning the inserted attr. This is done
when processing a bgp update. We store the returned attribute in the
path info struct. However, later on we modify one of the fields of the
attribute. This field is inspected by attrhash_cmp, the function that
allows the hash table to select the correct item from the hash chain for
a given key when doing a lookup on an item. By modifying the field after
it's been inserted, we open the possibility that two items in the same
chain that at insertion time were differential by attrhash_cmp becomes
equal according to that function. When performing subsequent hash
lookups, it is then indeterminate which of the equivalent items the hash
table will select from the chain (in practice it is the first one but
this may not be the one we want). Thus, it is illegal to modify
data used by a hash comparison function after inserting that data into
a hash table.
In fact this is occurring for attributes. We insert two attributes that
hash to the same key and thus end up in the same hash chain. Then we
modify one of them such that the two items now compare equal. Later one
we want to release the second item from the chain before XFREE()'ing it,
but since the two items compare equal we get the first item back, then
free the second one, which constitutes two bugs, the first being the
wrong attribute removed from the hash table and the second being a
dangling pointer stored in the hash table.
To rectify this we need to perform any modifications to an attr before
it is inserted into the table, i.e., before calling bgp_attr_intern().
This patch does that by moving the sole modification to the attr that
occurs after the insert (that I have seen) before that call.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
* Added vtysh cli commands and functions to set/unset bgp daemons no-rib
option during runtime and withdraw/announce routes in bgp instances
RIB from/to Zebra.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
The bgpTrapBackwardTransition callback was being called only during
bgp_stop and only under the condition that peer status was Established.
The MIB defines that the event should be generated for every transition
of the BGP FSM from a higher to a lower state.
Signed-off-by: Babis Chalios <mail@bchalios.io>
When deleting a dynamic peer, unsetting md5 password would cause
it to be unset on the listener allowing unauthenticated connections
from any peer in the range.
Check for dynamic peers in peer delete and avoid this.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
When setting authentication on a BGP peer in a VRF the listener is
looked up from a global list. However there is no check that the
listener is the one associated with the VRF being configured. This
can result in the wrong listener beiong configured with a password,
leaving the intended listener in an open authentication state.
To simplify this lookup stash a pointer to the bgp instance in
the listener on creating (in the same way as is done for NS-based
VRFS).
Signed-off-by: Pat Ruddy <pat@voltanet.io>
since the addition of srte_color to the comparison for bgp nexthops
it is possible to have several nexthops per prefix but since zebra
only sores a per prefix registration we should not unregister for
nh notifications for a prefix unti all the nexthops for that prefix
have been deleted. Otherwise we can get into a deadlock situation
where BGP thinks we have registered but we have unregistered from zebra.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Extend the NHT code so that only the affected BGP routes are affected
whenever an SR-policy is updated on zebra.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Example configuration:
route-map SET_SR_POLICY permit 10
set sr-te color 1
!
router bgp 1
bgp router-id 1.1.1.1
neighbor 2.2.2.2 remote-as 1
neighbor 2.2.2.2 update-source lo
address-family ipv4 unicast
neighbor 2.2.2.2 next-hop-self
neighbor 2.2.2.2 route-map SET_SR_POLICY in
exit-address-family
!
!
Learned BGP routes from 2.2.2.2 are mapped to the SR-TE Policy
which is uniquely determined by the BGP nexthop (2.2.2.2 in this
case) and the SR-TE color in the route-map.
Co-authored-by: Renato Westphal <renato@opensourcerouting.org>
Co-authored-by: GalaxyGorilla <sascha@netdef.org>
Co-authored-by: Sebastien Merle <sebastien@netdef.org>
Signed-off-by: Sebastien Merle <sebastien@netdef.org>
Fist, routing tables aren't the most appropriate data structure
to store nexthops and imported routes since we don't need to do
longest prefix matches with that information.
Second, by converting the NHT code to use rb-trees, we can index
the nexthops using additional information, not only the destination
address. This will be useful later to index bgpd's nexthops by
both destination and SR-TE color.
Co-authored-by: Sebastien Merle <sebastien@netdef.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
json = NULL; is set in a loop above and here we are trying to check and
free the object again which is never be reached.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
If you configure eBGP on loopbacks, you might miss setting the
ebgp-multihop option. Given that, the session will not be established
because of this. Now, the session is in Active state. When you update
your config afterwards and set the ebgp-multihop option to the
appropriate value, the session will still be in Active state. In fact,
it will be stuck in Active state and only services restart will help.
With this change, when set the ebgp-multihop option and no session was
established, reset the session.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
If you advertise a default route (via default-originate) only if some
prefix is present in the BGP RIB (route-map specified) and this prefix
becomes unavailable, the default route keeps being advertised.
With this change, when we iterate over the BGP RIB to check if we can
advertise the default route, skip unavailable prefixes.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
* Reverted back to using an ALIAS definition for the negated bgp
shutdown command with a concatenated message string.
* Unified cli command descriptions for bgp shutdown commands.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Changed command description string to use "Remove" instead of
"Disable" to prevent user confusion due to double negation.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Added a "no bgp shutdown message MSG..." cli command for ease of use
with copy/paste. Because of current limitations with DEFPY/ALIAS and
the message string concatenation, a new command instead of an ALIAS
had to be implemented.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
This will check route-maps as well, not only prefix-lists, access-lists, and
filter-lists.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
a dereference of null pointer exists in current flowspec code, with
prefix pointer. check validity of pointer before going ahead.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
because ecommunity structure can host both ext community and ipv6 ext
community, do not forget to set the unit_size field.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
because the same extended community can be used for storing ipv6 and
ipv4 et communities, the unit length must be stored. do not forget to
set the standard value in bgp evpn.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
if match protocol is icmp, then this protocol will be filtered with afi
= ipv4. however, if afi = ipv6, then the icmp protocol will fall back to
icmpv6.
note that this patch has also been done to simplify the policy routing,
as BGP will only handle TCP/UDP/ICMP(v4 or v6) protocols.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the following 3 options are not supported in current implementation of
policy routing. for that, inform the user that the flowspec entry is
invalid when attempting to use :
- prefix offset with src, or dst ipv6 address ( see [1])
- flowlabel value - limitation due to [0]
- fragment ( implementation not done today).
[0] https://bugzilla.netfilter.org/show_bug.cgi?id=1375
[1] https://bugzilla.netfilter.org/show_bug.cgi?id=1373
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in addition to ipv4 flowspec, ipv6 flowspec address family can configure
its own list of interfaces to monitor. this permits filtering the policy
routing only on some interfaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rfc 5701 is supported. it is possible to configure in bgp vpn, a list of
route target with ipv6 external communities to import. it is to be noted
that this ipv6 external community has been developed only for matching a
bgp flowspec update with same ipv6 ext commmunity.
adding to this, draft-ietf-idr-flow-spec-v6-09 is implemented regarding
the redirect ipv6 option.
Practically, under bgp vpn, under ipv6 unicast, it is possible to
configure : [no] rt6 redirect import <IPV6>:<AS> values.
An incoming bgp update with fs ipv6 and that option matching a bgp vrf,
will be imported in that bgp vrf.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in order to create appropriate policy route, family attribute is stored
in ipset and iptable zapi contexts. This commit also adds the flow label
attribute in iptables, for further usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this commit supports [0] where ipv6 address is encoded in nexthop
attribute of nlri, and not in bgp redirect ip extended community. the
community contains only duplicate information or not.
Adding to this, because an action or a rule needs to apply to either
ipv4 or ipv6 flow, modify some internal structures so as to be aware of
which flow needs to be filtered. This work is needed when an ipv6
flowspec rule without ip addresses is mentioned, we need to know which
afi is served. Also, this work will be useful when doing redirect VRF.
[0] draft-simpson-idr-flowspec-redirect-02.txt
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in ipv6 flowspec, a new type is defined to be able to do filtering rules
based on 20 bits flow label field as depicted in [0]. The change include
the decoding by flowspec, and the addition of a new attribute in policy
routing rule, so that the data is ready to be sent to zebra.
The commit also includes a check on fragment option, since dont fragment
bit does not exist in ipv6, the value should always be set to 0,
otherwise the flowspec rule becomes invalid.
[0] https://tools.ietf.org/html/draft-ietf-idr-flow-spec-v6-09
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
as per [0], ipv6 adress format introduces an ipv6 offset that needs to
be extracted too. The change include the validation, decoding for
further usage with policy-routing and decoding for dumping.
[0] https://tools.ietf.org/html/draft-ietf-idr-flow-spec-v6-09
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
until now, the assumption was done in bgp flowspec code that the
information contained was an ipv4 flowspec prefix. now that it is
possible to handle ipv4 or ipv6 flowspec prefixes, that information is
stored in prefix_flowspec attribute. Also, some unlocking is done in
order to process ipv4 and ipv6 flowspec entries.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Connect for X succeeds and hence moved from idle ->connect with
FD-x.
4. A incoming connection is accepted and a new peer datastructure Y
is created with FD-y moves from idle->Active state.
5. Peer datastercture Y FD-y sends out OPEN and moves to
Active->Opensent state.
6. Peer datastrcture Y FD-y receives OPEN and moved from Opensent->
Openconfirm state.
7. Meanwhile on peer datastrcture X FD-x sends out a OPEN message
and moved from connect->Opensent.
8. For peer datastrcture Y FD-y keep alive is received and it is
moved from OpenConfirm->Established.
9. In this case peer datastructure Y FD-y is a accepted connection
so we try to copy all its parameter to peer datastructure X and
delete Y.
10. During this process TCP connection for the accepted connection
(FD-y) goes down and hence get remote address and port fails.
11. With this failure bgp_stop function for both peer datastrure X
and peer datastructure Y is called.
12. By this time all the parameters include state for datastrcture
for X and Y are exchanged. Peer Y FD-y when it entered this
function had state OpenConfirm still which has been moved to peer
datastrcture X.
13. In bgp_stop it will stop all the timers and take action only if
peer is in established state. Now that peer datastrcture X and Y
are not in established state (in this function) it will simply
close all timers and close the socket and assigns socket for both
the peer datastrcture to -1.
14. Peer datastrcture Y will be deleted as it is a datastrcture created
due to accept of connection where as peer datastrcture X will be held
as it is created with configuration.
15. Now peer datastrcture X now holds a state of OpenConfirm without any
timers running.
16. With this any new incoming connection will never be able to establish
as there is config connection X which is stuck in OpenConfirm.
Fix:
While transferring the peer datastructure Y FD-y (accepted connection)
to the peer datastructure X, if TCP connection for FD-y goes down, then
1. Call fsm event bgp_stop for X (do cleanup with bgp_stop and move the
state to Idle) and
2. Call fsm event bgp_stop for Y (do cleanup with bgp_stop and gets deleted
since it is an accept connection).
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Peer datastrcture Y FD-X receives OPEN and moved from Opensent->
Openconfirm state and start the hold timer.
4. In the OpenConfirm state, the hold timer is stopped. So peer X
waits for Keepalive message from peer. If the Keepalive message
is not received, then it will be in OpenConfirm state for
indefinite time.
5. Due to this it neither close the existing connection nor it will
accept any connection from peer.
Fix:
In the OpenConfirm state, don't stop the hold timer.
1. Upon receipt of a neighbor’s Keepalive, the state is moved to
Established.
2. But If the hold timer expires, a stop event occurs, the state
is moved to Idle.
This is as per RFC.
Signed-off-by: Sarita Patra <saritap@vmware.com>
* Applied style suggestions by automated compliance check.
* Fixed function bgp_shutdown_enable to use immutable message string.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
When iterating over a `show ip bgp vrf all neighbors json` command
bgp is crashing.
The json variable was being double freed. When freeing it, set it
to NULL and then check to make sure it exists before we free.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The RFC states:
The BGP Identifier is a 4-octet, unsigned, non-zero integer that
should be unique within an AS. The value of the BGP Identifier
for a BGP speaker is determined on startup and is the same for
every local interface and every BGP peer.
We were going slightly beyond this and ensuring that the address
was a specific range of addresses which is no longer relevant.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Replaced alias for bgp shutdown command with separate regular command
to prevent internal CLI errors.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Peers are now automatically restarted by the reconnect timer instead
of a ManualStart event after lifting the administrative shutdown.
* Question of when to log what remains.
* Compiles and works as intended now.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Fixed integration in FSM and packet handling.
* Added CLI "show" output, incl. JSON.
* For review and testing only.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Changes allow administratively shutting down all peers of a BGP
instance.
* New CLI commands "[no] bgp shutdown" in vty shell.
* For review and testing only.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
This would be handy for situations when a notification was sent, but it's
absolutely not clear who triggered that.
Just in case dumping all attributes under the debug mode would help finding
the _bad_ attribute.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
* Removed old timer thread resets, since this has been taken care of
after execution of the threads by the thread_fetch function in
lib/thread.c for quite some time now.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Currently, bgpPeerTable only looks the default BGP instance. Most
vendors return all the available peers in this table. This commit
exposes all BGP instances.
The other tables are unchanged as it doesn't make sense to expose
routes from random VRFs into a single table. Vendors are using SNMP
contexts for that but we don't have support for it. Therefore, do
nothing.
Fix#6077
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
SYNC routes are paths rxed from a local-ES peer. These routes result in
the installation of local dataplane entries i.e. with access port as
destination (vs. the remote-VTEP destination that results in the packet
being sent via the VxLAN overlay).
If a SYNC path is selected as the best path it is always turned around
into a local path which immediately lowers the status of the SYNC path
to non-best. However we need to keep track of the highest MM seq-number
and peer activity to continue advertising the local path. In order to
do that we need information from the "second-best" SYNC path to be
bubbled up to the local best path. This "SYNC" info is then consolidated
and sent to zebra which is responsible for the MM handling and local
path management.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a SYNC route i.e. a route with a local ES as destination is
rxed on a switch (say L11) from an ES peer (say L12) a local
MAC/neigh entry is created on L11 with the local access port
as dest port.
Creation of the local entry triggers a local path advertisement from
L11. This could be a "locally-active" path or a "locally-inactive"
path. Inactive paths are advertised with the proxy bit.
To ensure that the local entry is not deleted by a SYNC route it is
given absolute precedence over peer-paths.
If there are two non-local paths with the same dest ES and same MM
seq number the non-proxy path is preferred. This is done to ensure
that we don't lose track of the peer-activity.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A new proxy flag has been added to the already existing NA extended
community to allow proxy advertisment of a local host by a VTEP that is
yet to indpendently establish local reachability.
Reference: draft-rbickhart-evpn-ip-mac-proxy-adv
The extendend mac-mobility sequence number needs to be synced across
the ES peers. However we cannot let a ES-peer path win over a local
path on the same ES. To accomplish that some parameters such as the
MM seq number are bubbled up from the non-best path to the local path.
This mechanism is explained further in the path-selection patch.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The `struct evpn_ead_addr` structure had a prefix length
associated with it. This value was only ever set never
used. Remove this from our system. The other
nice thing about this change is that it puts back
the sizeof struct route_node to 192 bytes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. Sample ES display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es
ES Flags: L local, R remote, I inconsistent
VTEP Flags: E ESR/Type-4, A active nexthop
ESI Flags RD #VNIs VTEPs
03:00:00:00:00:01:11:00:00:01 LR 27.0.0.15:15 10 27.0.0.16(EA)
03:00:00:00:00:01:22:00:00:02 LR 27.0.0.15:16 10 27.0.0.16(EA)
03:00:00:00:00:01:22:00:00:03 LR 27.0.0.15:17 10 27.0.0.16(EA)
03:00:00:00:00:02:11:00:00:01 R - 10 27.0.0.17(A),27.0.0.18(A)
03:00:00:00:00:02:22:00:00:02 R - 10 27.0.0.17(A),27.0.0.18(A)
03:00:00:00:00:02:22:00:00:03 R - 10 27.0.0.17(A),27.0.0.18(A)
torm-11#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2. Sample ES-EVI display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es-evi
Flags: L local, R remote, I inconsistent
VTEP-Flags: E EAD-per-ES, V EAD-per-EVI
VNI ESI Flags VTEPs
1005 03:00:00:00:00:01:11:00:00:01 LR 27.0.0.16(EV)
1005 03:00:00:00:00:01:22:00:00:02 LR 27.0.0.16(EV)
1005 03:00:00:00:00:01:22:00:00:03 LR 27.0.0.16(EV)
1005 03:00:00:00:00:02:11:00:00:01 R 27.0.0.17(EV),27.0.0.18(EV)
1005 03:00:00:00:00:02:22:00:00:02 R 27.0.0.17(EV),27.0.0.18(EV)
1005 03:00:00:00:00:02:22:00:00:03 R 27.0.0.17(EV),27.0.0.18(EV)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3. Sample EAD route display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn route type ead
BGP table version is 19, local router ID is 27.0.0.15
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [4]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Extended Community
Route Distinguisher: 27.0.0.15:5
*> [1]:[0]:[03:00:00:00:00:01:11:00:00:01]:[128]:[0.0.0.0]
27.0.0.15 32768 i
ET:8 RT:5550:1009
*> [1]:[0]:[03:00:00:00:00:01:22:00:00:02]:[128]:[0.0.0.0]
27.0.0.15 32768 i
ET:8 RT:5550:1009
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is the base patch that brings in support for Type-1 routes.
It includes support for -
- Ethernet Segment (ES) management
- EAD route handling
- MAC-IP (Type-2) routes with a non-zero ESI i.e. Aliasing for
active-active multihoming
- Initial infra for consistency checking. Consistency checking
is a fundamental feature for active-active solutions like MLAG.
We will try to levarage the info in the EAD-ES/EAD-EVI routes to
detect inconsitencies in access config across VTEPs attached to
the same Ethernet Segment.
Functionality Overview -
========================
1. Ethernet segments are created in zebra and associated with
access VLANs. zebra sends that info as ES and ES-EVI objects to BGP.
2. BGP advertises EAD-ES and EAD-EVI routes for the locally attached
ethernet segments.
3. Similarly BGP processes EAD-ES and EAD-EVI routes from peers
and translates them into ES-VTEP objects which are then sent to zebra
as remote ESs.
4. Each ES in zebra is associated with a list of active VTEPs which
is then translated into a L2-NHG (nexthop group). This is the ES
"Alias" entry
5. MAC-IP routes with a non-zero ESI use the alias entry created in
(4.) to forward traffic i.e. a MAC-ECMP is done to these remote-ES
destinations.
EAD route management (route table and key) -
============================================
1. Local EAD-ES routes
a. route-table: per-ES route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
b. route-table: per-VNI route-table
Not added
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
2. Remote EAD-ES routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
3. Local EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
4. Remote EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
Please refer to bgp_evpn_mh.h for info on how the data-structures are
organized.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Add ESI as an inline attribute field along with the other EVPN
attributes. This may be re-worked when the rest of the EVPN
attributes find a new home.
Some cleanup has been done to get rid of stale/unused references
to ESI. And also to consolidate duplicate definitions of ES ID
types.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. EAD routes require support for ESI_LABEL extended community. The
primary info in this EC is a flags the specifies if the ES is
Single-active or active-acive.
2. Also fixed up ES_IMPORT_RT string. Support was added a long time
ago for ESR/Type-4 routes but it has not really been exercised for
MH functionality till now.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Re-org only; no other code changes. This is being done to make maintanence
of MH functionality (which will have more code added to it) easy.
The code moved here was originally committed via -
'commit 50f74cf131 ("*: support for evpn type-4 route")'
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Revert "zebra: support for macvlan interfaces"
This reverts commit bf69e212fd.
Revert "doc: add some documentation about bgp evpn netns support"
This reverts commit 89b97c33d7.
Revert "zebra: dynamically detect vxlan link interfaces in other netns"
This reverts commit de0ebb2540.
Revert "bgpd: sanity check when updating nexthop from bgp to zebra"
This reverts commit ee9633ed87.
Revert "lib, zebra: reuse and adapt ns_list walk functionality"
This reverts commit c4d466c830.
Revert "zebra: local mac entries populated in correct netnamespace"
This reverts commit 4042454891.
Revert "zebra: when parsing local entry against dad, retrieve config"
This reverts commit 3acc394bc5.
Revert "bgpd: evpn nexthop can be changed by default"
This reverts commit a2342a2412.
Revert "zebra: zvni_map_to_vlan() adaptation for all namespaces"
This reverts commit db81d18647.
Revert "zebra: add ns_id attribute to mac structure"
This reverts commit 388d5b438e.
Revert "zebra: bridge layer2 information records ns_id where bridge is"
This reverts commit b5b453a2d6.
Revert "zebra, lib: new API to get absolute netns val from relative netns val"
This reverts commit b6ebab34f6.
Revert "zebra, lib: store relative default ns id in each namespace"
This reverts commit 9d3555e06c.
Revert "zebra, lib: add an internal API to get relative default nsid in other ns"
This reverts commit 97c9e7533b.
Revert "zebra: map vxlan interface to bridge interface with correct ns id"
This reverts commit 7c990878f2.
Revert "zebra: fdb and neighbor table are read for all zns"
This reverts commit f8ed2c5420.
Revert "zebra: zvni_map_to_svi() adaptation for other network namespaces"
This reverts commit 2a9dccb647.
Revert "zebra: display interface slave type"
This reverts commit fc3141393a.
Revert "zebra: zvni_from_svi() adaptation for other network namespaces"
This reverts commit 6fe516bd4b.
Revert "zebra: importation of bgp evpn rt5 from vni with other netns"
This reverts commit 28254125d0.
Revert "lib, zebra: update interface name at netlink creation"
This reverts commit 1f7a68a2ff.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Added a macro to validate the v4 mapped v6 address.
Modified bgp receive & send updates for v4 mapped v6 address as
nexthop and installing it as recursive nexthop in RIB.
Minor change in fpm while sending the routes for nexthop as
v4 mapped v6 address.
Signed-off-by: Kaushik <kaushik@niralnetworks.com>
When we have a prefix that has been selected, note that that
particular flag has been set and give that information to the
end user.
eva# show bgp ipv4 uni neighbors 192.168.161.131 prefix-counts
Prefix counts for 192.168.161.131, IPv4 Unicast
PfxCt: 814246
Counts from RIB table walk:
Adj-in: 0
Damped: 0
Removed: 0
History: 0
Stale: 0
Valid: 814246
All RIB: 814246
PfxCt counted: 814246
PfxCt Best Selected: 0
Useable: 814246
eva# show bgp ipv4 uni neighbors 192.168.161.2 prefix-counts
Prefix counts for 192.168.161.2, IPv4 Unicast
PfxCt: 814070
Counts from RIB table walk:
Adj-in: 0
Damped: 0
Removed: 0
History: 0
Stale: 0
Valid: 814070
All RIB: 814070
PfxCt counted: 814070
PfxCt Best Selected: 814070
Useable: 814070
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Coverity has noticed that we are using bgp_evpn after
we have already NULL checked it one time. Add an assert
to make Coverity happy here, if we get to this point
something terrible has happened.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Coverity rightly points out that bgp_table_top might return
NULL and immediately deref'ing it might be a problem.
Add a bit of safety.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I wanted to preserve the old code flow to see what might
be needed in the future in commit:
23ca3269da
Coverity doesn't like dead code. So let's comment it out.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If _force_ is set, then ALL prefixes are counted for maximum instead of
accepted only. This is useful for cases where an inbound filter is applied,
but you want maximum-prefix to act on ALL (including filtered) prefixes.
For instance, we have a configuration like:
neighbor r1 maximum-prefix 10
neighbor r1 prefix-list custom in
!
ip prefix-list custom seq 1 permit 10.0.0.0/24
ip prefix-list custom seq 2 permit 10.0.1.0/24
This will accept only 2 prefixes and discard all others instead of
shutting down the session when 10 is reached.
With this new knob (force), we will count all received prefixes and shutdown
the session when 10 is reached.
The bigger problem is when you have lots of peers with full feed and such a
configuration like in an example.
This is kinda re-ordering of how to treat filter vs. maximum-prefix.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
While checking my BGP debugging settings at the console, I noticed
this message was missing a newline. Add it to be consistent with the
other similar messages.
Signed-off-by: Russell Bryant <rbryant@redhat.com>
During perf testing of receiving and installing 7.5 million
routes into zebra it was noticed that memset in bgp_zebra_announce
was taking ~11% of the runtime. With this change bgp_zebra_announce
now no longer has any appreciable time spent in memset as reported
by perf. In addition bgp_zebra_announce run time in perf was
reduced by a composite amount.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
```
(gdb) bt
0 0x00007f45a6f0a781 in raise () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007f45a6ef455b in abort () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007f45a7781920 in core_handler (signo=11, siginfo=0x7fffac7b84b0, context=<optimized out>) at lib/sigevent.c:228
3 <signal handler called>
4 0x000055a4133c0f32 in bgp_table_stats (vty=vty@entry=0x55a415acb240, bgp=0x0, afi=AFI_IP, safi=SAFI_UNICAST, json_array=json_array@entry=0x0) at bgpd/bgp_route.c:11412
5 0x000055a4133c13fb in show_ip_bgp_afi_safi_statistics (self=<optimized out>, vty=0x55a415acb240, argc=6, argv=<optimized out>) at bgpd/bgp_route.c:10749
6 0x00007f45a773917d in cmd_execute_command_real (vline=vline@entry=0x55a415ab7e10, vty=vty@entry=0x55a415acb240, cmd=cmd@entry=0x0, filter=FILTER_RELAXED)
at lib/command.c:909
7 0x00007f45a773afdf in cmd_execute_command (vline=vline@entry=0x55a415ab7e10, vty=vty@entry=0x55a415acb240, cmd=0x0, vtysh=vtysh@entry=0) at lib/command.c:968
8 0x00007f45a773b135 in cmd_execute (vty=vty@entry=0x55a415acb240, cmd=cmd@entry=0x55a415ace950 "show ip bgp vrf all statistics", matched=matched@entry=0x0,
vtysh=vtysh@entry=0) at lib/command.c:1122
9 0x00007f45a7794d62 in vty_command (vty=vty@entry=0x55a415acb240, buf=0x55a415ace950 "show ip bgp vrf all statistics") at lib/vty.c:526
10 0x00007f45a7794fb6 in vty_execute (vty=vty@entry=0x55a415acb240) at lib/vty.c:1293
11 0x00007f45a7797804 in vtysh_read (thread=<optimized out>) at lib/vty.c:2126
12 0x00007f45a778f641 in thread_call (thread=thread@entry=0x7fffac7bb040) at lib/thread.c:1550
13 0x00007f45a775b6d8 in frr_run (master=0x55a415542820) at lib/libfrr.c:1098
14 0x000055a4133815d6 in main (argc=10, argv=0x7fffac7bb2a8) at bgpd/bgp_main.c:509
```
"show ip bgp vrf all statistics" should show statistics for all VRFs if "all"
is specified.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
1) When a session comes up for a peer and if the peer has not adverised
the GR capabilities, BGP sends a request to Zebra to clear any
stale routes that might exist from that peer.
2) When OPEN message is received from the peer, clear the previously
advertised GR capability by the peer, if the lastest received
OPEN message does not contain the GR capability.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
It's hard to cope with cases when next-hop is changed/unchanged or
peers are non-direct.
It would be better to show the hostname and nexthop IP address (both)
under `show bgp` to quickly identify the source and the real next-hop
of the route.
If `bgp default show-nexthop-hostname` is toggled the output looks like:
```
spine1-debian-9# show bgp
BGP table version is 1, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* 2a02:4780::/64 fe80::a00:27ff:fe09:f8a3(exit1-debian-9)
0 0 65001 ?
spine1-debian-9# show ip bgp
BGP table version is 5, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.255.255.0/24 192.168.0.1(exit1-debian-9)
0 0 65001 ?
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
BFD profiles can now be used on the interface level like this:
interface eth1
ip router isis 1
isis bfd
isis bfd profile default
Here the 'default' profile needs to be specified as usual in the
bfdd configuration.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>
```
exit1-debian-9# show bgp summary
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
192.168.0.2 4 200 10 6 0 0 0 00:00:35 8 8
2a02:4780::2 4 0 0 1 0 0 0 never Active 0
Total number of neighbors 2
exit1-debian-9# show bgp summary established
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
192.168.0.2 4 200 10 6 0 0 0 00:00:39 8 8
Total number of neighbors 2
exit1-debian-9# show bgp summary failed
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
2a02:4780::2 0 0 never Waiting for peer OPEN
Total number of neighbors 2
exit1-debian-9#
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
If the RT changes on a L3VPN route then any leak of this route into
a VRF should be withdrawn.
Extend existing EVPN check for RT change to cover L3VPN routes.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
the validation of rpki routes will impact the matching bgp instance.
Until now, the rpki was triggering validation of all bgp entries.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki config can be displayed in the 'show running-config'.
there is a fix to be done yet, this is related to the order of rpki per
vrf configuration. actually, the output is not saveable in the
running-config since the rpki commands are swapped. this prevents from
running rpki config at startup.
That commit also changes the identation, since rpki configure node was
with one extra space. reducing this, and add the changes for vrf
configuration too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki vrf subnode is instantiated under the vrf subnode.
It it to be noted that this commit contains a change in vtysh.
Actually, the output of bgp daemon from show running-config is extracted
in vtysh, and reengineered ( hence the vtysh_config.c change done). This
permits having a subnode under vrf sub node.
Also, add vrf node support to bgpd, as rpki command can not be found
under vrf node.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
it is possible to dump rpki commands per vrf context.
also, rpki start/stop commands are also appended with vrfname parameter.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this commit change introduces a callback function pointer that rtrlib
calls. this permits to create the socket and initialising the socket
with the right information, in the right vrf. Adding to this, rpki uses
a hook to be triggered when a vrf is enabled/disabled. in this way,
start mechanisms will be triggered only when vrf is available, and stop
mechanism will be done upon vrf disable event.
Adding to this, the cache structure contains a back pointer to the rpki
vrf structure. this is done to retrieve the vrf where the cache points
to.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki context can be removed by doing 'no rpki' command from configure
node. this work allows to allocate the associated rpki_vrf context when
entering in rpki node, instead of at the initialisation step.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this work is a preparatory work so that rpki can have per-vrf contexts.
the work consists in allocating a rpki_vrf structure with all inside:
rtr_config, cache, etc..
This work is also necessary in the long term support with yang
northboundapi. Indeed, there may be highly possible that yang context
for rpki be defined per core instance.
That work also instantiates a list of rpki_vrf, though only one instance
is created.
That work also introduces a vrfname field attribute that is set to null
for now , and stands for default vrf where rpki is configured on.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki debugging is linked with standard bgp debugging facilities.
- debug rpki is dumped in running-config if the command is executed from
configure terminal.
- show debugging indicated whether rpki debug is enabled or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when a plugin is attached, some debugs may be attached to that plugin.
For that, add one hook that is interacting with vty: a boolean indicates
what the usage is for: either for impacting the 'show running-config',
or for impacting the 'show debugging' command.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the show running-config rpki was displaying systematically the default
values, when at least one cache server was configured. now, if the rpki
configuration has been changed, either because of a new cache server, or
because of a change in the default settings, then the associated
configuration is dumped in the 'show running-config' command.
adding to this, to permit user to dump the settings values, the command
'show rpki configuration' dumps the values whatever default or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
if ssh cache servers are configured, then show rpki-table is looking at
the tcp server context. Fix this by checking the server cache type, and
also display the ssh context if this is configured.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
currently, private and public key files must differ with the suffix
keywork : '.pub'. If it is not the case, the pub key is ignored.
Inform user for that.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
We have a bunch of code in bgp_vty.c that was passing
to peer_af_flag_modify_vty more than 1 flag at a time.
This was causing the underlying routines to get the
flags wrong. In order to prevent this convert all the
places where we send multiple flags down to this function
to individual flag changes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
SIGHUP is ostensibly supposed to reload configuration
from a fresh slate. This is currently horribly broken
so much so that bgp just crashes. I see no point
in trying to make this work considering the yang
work coming down the pike.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Issue: bgp_process_writes will be called when the fd is writable.
And it will bgp_generate_updgrp_packets to generate the
update packets no matter MRAI is set or not.
Fix: bgp_generate_updgrp_packets thread will return without sending
any update when MRAI timer is still running.
Signed-off-by: Richard Wu <wutong23@baidu.com>
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
We use ASN:VNI format to calculate auto RT for L3VNI.
When L3VNI is not configured, if we delete the configured RT, incorrect auto-RT
value is generated as VRF VNI is 0.
Fix:
Do not configure auto-RT if L3VNI is not configured.
Trigger:
1. Delete L3VNI
2. Delete configured RT.
Before fix:
dev# sh bgp vrf vrf-blue vni
BGP VRF: vrf-blue
Local-Ip: 10.100.0.1
L3-VNI: 0
Rmac: 00:00:00:00:00:00
VNI Filter: none
L2-VNI List:
Export-RTs:
RT:101:0
Import-RTs:
RT:101:0
RD: 10.100.0.1:2
After fix:
dev# sh bgp vrf vrf-blue vni
BGP VRF: vrf-blue
Local-Ip: 10.100.0.1
L3-VNI: 0
Rmac: 00:00:00:00:00:00
VNI Filter: none
L2-VNI List:
Export-RTs:
Import-RTs:
RD: 10.100.0.1:2
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
If we have something like:
```
ip route 1.1.1.0/24 Null0
!
router bgp 100
no bgp ebgp-requires-policy
neighbor 192.168.0.2 remote-as 200
!
address-family ipv4 unicast
network 1.1.1.0/24
redistribute connected
exit-address-family
!
line vty
!
```
1.1.1.0/24 is not advertised due to martian nexthop (0.0.0.0). It starts
working only when we use `redistribute static`.
By checking if it's a BGP static route we able to announce
1.1.1.0/24 with `network 1.1.1.0/24` without redistribute even when
`bgp import-check` is enabled.
Disabling `bgp import-check` works as well, but it's enabled by default
since 7.4.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
RFC states that time should be in seconds since the epoch.
The code was using system uptime in seconds.
Fixes: #6549
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Announcements that are marked as invalid were previously not revalidated.
This was fixed by replacing the range lookup with a subtree lookup.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Currently the I/O pthread handles incoming/outgoing data
communication with all peers. There is no attempt at modifying
the hold timers. It's sole goal is to read/write data to appropriate
channels. All this data is handled as *events* on the master pthread
in BGP. The problem is that if the master pthread is extremely busy
then any packet read that would be treated as a keepalive event may
happen after the hold timer pops, due to the way thread events are handled
in lib/thread.c.
In a last gap attempt, if we notice that we have incoming data
to proceses on the input Queue, slightly delay the hold timer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The tr_*_config structs were previously not pre initialized because
every field is initialized explicitly. But future rtrlib version will
introduce additional fields. Preinitialising the entire struct will
ensure forward compatibility.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Problem reported where bgp sessions were being torn down for ibgp
peers with the reason being optional attribute error. Found that
when a route was leaked, the RTs were stripped but the actual
EXTCOMMUNUNITY attribute was not cleared so an empty ecommunity
attribute stayed in the bgp table and was sent in updates.
Ticket: CM-30000
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
When a peer is bound to a peer-group, the GR flags set on the
peer are over-written.
Update the GR flags for the peer after it has been bound to a
peer-group.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
The code in the bgp extcommunity-list function was using
argv_find to get the correct idx. The problem was that
we had already done argv_finds before and idx was non-zero
thus having us always set the seq pointer to what was last
looked up. This causes us to pass in a value to the
underlying function and it would just wisely ignore it
causing a seq number of 0.
We would then write this seq number of 0 and then immediately
reject it on read in again. BOO!
Actually handle argv_find the way it was meant to be.
Ticket:CM-29926
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When issuing the command `match ip next-hop address`
bgp would crash. This is because the no form of the
command was making the address optional and we would
try to read data we should not be.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
bgp_accept() gets called over and over again when a VRF device is
deleted out from under a bgp listener socket that is bound to it.
Prevent this by noting the error and cancelling ourselves, allowing the
vrf status code to clean up the mess when it receives word about the
change from Zebra.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Try to give a bit more useful data about where we
think the connection is trying to come in from.
Hopefully this will let us debug connection issues
a bit faster in cases where there are config issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When received packet is processed in bgp_process_reads(), the data
is copied to static buffer and then copied to stream buffer.
The data can be copied directly to stream buffer which will avoid extra memcpy
Signed-off-by: kssoman <somanks@gmail.com>
Don't attempt to send BFD daemon a message to remove the peer
registration on daemon exit, otherwise we'll access a dangling
interface pointer and we'll crash.
This crash was not previosly possible because the function that built
the message was passing the interface pointer but not using it due to
the exit condition.
In `lib/bfd.c`:
```
void bfd_peer_sendmsg(struct zclient *zclient, struct bfd_info *bfd_info,
int family, void *dst_ip, void *src_ip, char *if_name,
int ttl, int multihop, int cbit, int command,
int set_flag, vrf_id_t vrf_id)
{
struct bfd_session_arg args = {};
size_t addrlen;
/* Individual reg/dereg messages are suppressed during shutdown. */
if (CHECK_FLAG(bfd_gbl.flags, BFD_GBL_FLAG_IN_SHUTDOWN)) {
if (bfd_debug)
zlog_debug(
"%s: Suppressing BFD peer reg/dereg messages",
__func__);
return;
}
```
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
To remove a BFD profile without removing the BFD configuration just call
`neighbor <A.B.C.D|X:X::X:X|WORD> bfd`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Allow BGP to use the new API to configure BFD session profiles. Now it
is possible to preconfigure BFD sessions without needing to create the
peers.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
"set community accept-own-nexthop" returns "malformed communities"
error. This is because the token matching hits an earlier "accept-own"
and leaves "-nexthop" as a separate token to be processed.
Reorder the switch cases so that both are processed correctly.
Signed-off-by: Appu Joseph <apjo@kaloom.com>
We are crashing in thread_cancel on shutdown because
the thread pointer is NULL. Use the more appropriate
THREAD_CANCEL macro
Ticket: CM-29873
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Extend the next hop tracking for type-2 and type-3 EVPN routes also.
Updates: "bgpd: Add nexthop of received EVPN RT-5 for nexthop tracking"
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When there is a NHT change and the paths dependent on that NHT are being
evaluated, skip those that are marked for removal or as history.
When a route gets withdrawn, its valid flag is cleared and it is flagged
for removal; in the case of an EVPN route, it is also unimported from
VRFs (L2 and/or L3). bgp_process is then scheduled. Under rare timing
conditions, an NHT update for the route's next hop may arrive right after,
and if routes flagged for removal are not skipped, they may not only be
incorrectly marked as valid but also re-imported in the case of EVPN,
which will be a serious error.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ensure that only if there is a change to the path's validity based
on the NHT update, EVPN import or unimport is invoked.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Display next hop resolution information, whether the "detail" option is
specified or not as it is quite fundamental and only minimally increases
the output.
Introduce option to look at a specific NHT entry, which will also show
the paths associated with that entry.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Clean up a few lines of cli command installation; remove a
duplicate; follow the command grouping pattern better.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
There can be cases where evpn traffic is not meshed across various
endpoints, but sent to a central pe. For this situation, remove the
nexthop unchanged default behaviour for bgp evpn. Also add route
reflector commands to bgp evpn node.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Non-best paths (path info structures) also need to be freed during
table cleanup not only to release their memory but to also ensure
any linkages are updated correctly. One such example is for EVPN
where there is a link between the imported path info (in a L2 or
L3 vrf instance) and its parent path info.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
We had already removed the `ip as-path..` command
to have `bgp as-path` but for some reason a `no ip as-path..`
command ALIAS was still around. Kill with extreme prejudice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Without specifying a default afi/safi we get a segfault:
```
(gdb) frame 4
bgp_table_stats (..., afi=32724, safi=SAFI_UNICAST, ...
11349 if (!bgp->rib[afi][safi]) {
(gdb)
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
L3VNI is configured with "prefix-routes-only" flag. Even in this case,
intermittently, we observed that local EVPN MACIP routes are installed and
advertised with 2 labels and 2 export RTs.
This is a sequencing issue. Consider following case where L2VNI 200 and L3VNI
1000 are configured for tenant vrf vrf-blue.
Bug is observed for following sequence of events:
1. vrf-blue BGP instance is created.
2. L2VNI is created in bgp for vni 200. It is linked to the tenant vrf vrf-blue
in function bgpevpn_link_to_l3vni.
Following code sets "VNI_FLAG_USE_TWO_LABELS" flag for vni 200 as L3VNI is not
yet attached to vrf-blue BGP instance.
/* check if we are advertising two labels for this vpn */
if (!CHECK_FLAG(bgp_vrf->vrf_flags, BGP_VRF_L3VNI_PREFIX_ROUTES_ONLY))
SET_FLAG(vpn->flags, VNI_FLAG_USE_TWO_LABELS);
2. Now L3VNI is attached to vrf-blue BGP instance. In this case, we set
BGP_VRF_L3VNI_PREFIX_ROUTES_ONLY flag for vrf-blue but we do not clear
VNI_FLAG_USE_TWO_LABELS flag set on the corresponding L2VNIs.
This fix resolves following 2 issues observed above.
1. When L2VNI is created in BGP, flag VNI_FLAG_USE_TWO_LABELS should not be set
for this VNI if BGP vrf is not attached to any L3VNI.
2. When L3VNI is attached to the BGP vrf, set "VNI_FLAG_USE_TWO_LABELS" flag
if "prefix-routes-only" is not for the vrf.
UT cases:
1. Flap "prefix-routes-only" config for a vrf.
2. Test following triggers for vrfs with and without "prefix-routes-only"
- Flap L2VNI from kernel.
- Flap L3VNI from kernel.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
The `bgp bestpath bandwidth` command should not be a legal
command. Pull out the `no` form to allow this. Allow
`no bgp bestpath bandwidth` to work as we would expect.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is not the attribute involved in path selection and by rfc7606 it should
be just ignored.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Community attributes might have been removed by an inbound route map, so we
should check to ensure they still exist before trying to free them.
This fixes a segfault described in issue #6345.
Signed-off-by: Josh Cox <josh.cox@pureport.com>
The problem is that peer_af_array returns NULL when SAFI is changed to
unicast. We use unicast table, but peer is created and activated under
labeled-unicast, hence we should lookup with a proper SAFI id.
Without this patch peer_af_find() returns NULL and we can't show
PfxSnt in `show bgp summary`.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
It is possible that the if_lookup_by_index() call will return
a NULL value and calling zclient_send_interface_radv_req. Just
test that we have a valid interface pointer.
Found by Coverity
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
unicast and labeled-unicast share the same table, but configuration should
be visible for both independently. Without this fix it confuses a bit
because when you enter `network 10.0.0.0/24` under labeled-unicast it's
written in unicast family block.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Modify the import-check command to require the underlying prefix
to exist in the rib. General consensus is that this is the correct
behavior.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that in many circumstances, RAs created in the
process of bringing up numbered IPv6 peers with extended-nexthop
capability enabled (for ipv4 over ipv6) were not stopped on the
interface when those peers were deleted. Found several circumstances
where this occurred and fix them in this patch.
Ticket: CM-26875
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
These are easy to get subtly wrong, and doing so can cause
nondeterministic failures when racing in parallel builds.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Issue:
Configuring default-originate when static default route is previously
advertised results in withdrawal of the route.
Fix :
Delete the adj-out entry for the previously advertised static
default route without sending explicit withdraw message.
Signed-off-by: kssoman <somanks@gmail.com>
the nlri flowspec above 240 bytes size was not handled.
Over 240 bytes, the length is 2 bytes length, and a calculation must be
done to obtain the real length. This commit handles it appropriately.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
- Fix 1 byte overflow when showing GR info in bgpd
- Use PATH_MAX for path buffers
- Use unsigned specifiers for uint16_t's in zebra pbr
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When we receive an UPDATE with MP_NEXTHOP len as 32 bytes, we shouldn't
check if the global (1st) nexthop is unspecified.
Peering between bird and FRRouting we receive from Bird something like:
```
rcvd UPDATE w/ attr: , origin i, mp_nexthop ::(fe80::a00:27ff:fe09:f8a3)
```
The link-local (2nd) nexthop is valid and validated later in the code.
Before it was marked:
```
IPv6 unicast -- DENIED due to: martian or self next-hop;
```
After it's a valid prefix:
```
spine1-debian-9# show bgp
BGP table version is 0, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
2a02:4780::/64 fe80::a00:27ff:fe09:f8a3
0 65001 i
Displayed 1 routes and 1 total paths
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Replace all `random()` calls with a function called `frr_weak_random()`
and make it clear that it is only supposed to be used for weak random
applications.
Use the annotation described by the Coverity Scan documentation to
ignore `random()` call warnings.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
In real world sometimes happens that bgp_nexthop_cache is NULL. Avoid
segfaulting when using `show [ip] bgp ...` CLI commands.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Rather than doing a f*gly hack for the RPKI code, let's do an on-exit
hook in cmd_node. Also allows replacing some special-casing in the vty
code.
Signed-off-by: David Lamparter <equinox@diac24.net>
And again for the name. Why on earth would we centralize this, just so
people can forget to update it?
Signed-off-by: David Lamparter <equinox@diac24.net>
Same as before, instead of shoving this into a big central list we can
just put the parent node in cmd_node.
Signed-off-by: David Lamparter <equinox@diac24.net>
There is really no reason to not put this in the cmd_node.
And while we're add it, rename from pointless ".func" to ".config_write".
[v2: fix forgotten ldpd config_write]
Signed-off-by: David Lamparter <equinox@diac24.net>
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>
The problem is when using kinda such topologies:
(192.168.1.1/32) r1 <-- eBGP --> r2 <-- iBGP --> r3
Looking at r3's nexthop for 192.168.1.1/32 we have it as r2, but really
it MUST be r1.
Checking if the nexthop is connected solves the problem even for cases
when route-reflectors are used.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
This fixes unnecessary whitespaces and makes capitalization
match for route type help strings.
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
Some competitive vendors like Cisco, Bird, OpenBGPD,
Nokia already have this by default enabled.
The list is here: https://github.com/bgp/RFC8212
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Problem Description:
=====================
+--+ +--+
|R1|-(192.201.202.1)----iBGP----(192.201.202.2)-|R2|
+--+ +--+
Routes on R2:
=============
S>* 202.202.202.202/32 [1/0] via 192.201.78.1, ens256, 00:40:48
Where, the next-hop network, 192.201.78.0/24, is a directly connected network address.
C>* 192.201.78.0/24 is directly connected, ens256, 00:40:48
Configurations on R1:
=====================
!
router bgp 201
bgp router-id 192.168.0.1
neighbor 192.201.202.2 remote-as 201
!
Configurations on R2:
=====================
!
ip route 202.202.202.202/32 192.201.78.1
!
router bgp 201
bgp router-id 192.168.0.2
neighbor 192.201.202.1 remote-as 201
!
address-family ipv4 unicast
redistribute static
exit-address-family
!
Step-1:
=======
R1 receives the route 202.202.202.202/32 from R2.
R1 installs the route in its BGP RIB.
Step-2:
=======
On R1, a connected interface address is added.
The address is the same as the next-hop of the BGP route received from R2 (192.201.78.1).
Point of Failure:
=================
R1 resolves the BGP route even though the route's next-hop is its own connected address.
Even though this appears to be a misconfiguration it would still be better to safeguard the code against it.
Fix:
====
When BGP receives a connected route from Zebra, it processes the
routes for the next-hop update.
While doing so, BGP must ignore routes whose next-hop address matches
the address of the connected route for which Zebra sent the next-hop update
message.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
Ensure that upon a link-bandwidth change - for e.g., due to change in
the number of multipaths - EVPN type-5 route injection is triggered.
In the absence of this, the proper link-bandwidth is not updated in
EVPN type-5 routes originated by the router.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
take into account polychaeta tips ono code style.
also, take into account miscellaneous code style recommandations like
braces usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Multiple different issues causing mostly UAFs but maybe other more
subtle things.
- Cluster lists were the only attributes whose pointers were not being
NULL'd when freed, resulting in heap UAF
- When performing an insert into the cluster hash, our temporary struct
used for hash_get() was inconsistent with our hash keying and
comparison functions. In the case of a zero length cluster list, the
->length field is 0 and the ->list field is NULL. When performing an
insert, we set the ->list field regardless of whether the length is 0.
This resulted in the two cluster lists hashing equal but not comparing
equal. Later, when removing one of them from the hash before freeing
it, because the key matched and the comparison succeeded (because it
was set to NULL *after* the search but *before* inserting into the
hash) we would sometimes release the duplicated copy of the struct,
and then free the one that remained in the hash table. Later accesses
constitute UAF. This is fixed by making sure the fields used for the
existence check match what is actually inserted into the hash when
that check fails.
This patch also makes cluster_unintern static, because it should be.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
... Oops ...
(for context, the defaults code originally didn't have a dedicated
"bool" variant and just used long for bools... I derp'd this when
adding bool as a separate case :( )
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@diac24.net>
This macro is undefined if vnc is disabled, and while it defaults to 0,
this is still wrong and causes issues with -Werror
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This is a full rewrite of the "back end" logging code. It now uses a
lock-free list to iterate over logging targets, and the targets
themselves are as lock-free as possible. (syslog() may have a hidden
internal mutex in the C library; the file/fd targets use a single
write() call which should ensure atomicity kernel-side.)
Note that some functionality is lost in this patch:
- Solaris printstack() backtraces are ditched (unlikely to come back)
- the `log-filter` machinery is gone (re-added in followup commit)
- `terminal monitor` is temporarily stubbed out. The old code had a
race condition with VTYs going away. It'll likely come back rewritten
and with vtysh support.
- The `zebra_ext_log` hook is gone. Instead, it's now much easier to
add a "proper" logging target.
v2: TLS buffer to get some actual performance
Signed-off-by: David Lamparter <equinox@diac24.net>
- each statistics is encapsulated into concatenated "<afi><safi>" value.
- the json encoding for floating and double values is using json api
double api. this change is done for bgp statistics.
- the lines over 80 characters have been handled.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this command is a shortcut to facilitate the extraction of statistics
for all afi/safi related to one bgp instance.
the command is: show bgp [vrf XX] statistics-all [json]
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
safis that use a route distinguisher in bgp tables, and as such
introduce a two level hierarchy on the bgp table, must be made available
to statistics too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
add json support for show bgp statistics command.
The title of the stats entry is aggregated without spaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The BGP Router MAC extended community should be unique and not occur
multiple times. In a VRF-to-VRF route-leak scenario where EVPN routes
from a source VRF are leaked into the target VRF and then injected
back into EVPN from the target VRF, the resulting route had more than
one RMAC. With this fix, the resulting route will have only the
target VRF's RMAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The EVPN advertise route-map may generate extended communities for an IPv4
or IPv6 route injected into EVPN as type-5. If so, allow for it and add
to it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Support configurable options to control how link bandwidth is handled
by the receiver. The default behavior is to automatically honor the
link bandwidths received and use it to perform a weighted ECMP BUT only
if all paths in the multipath have associated link bandwidth; if one or
more paths do not have link bandwidth, normal ECMP is performed among
the multipaths. This behavior is as recommended by
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth.
The additional options available are to (a) completely ignore any link
bandwidth (i.e., weighted ECMP is effectively disabled), (b) skip paths
in the multipath which do not have link bandwidth and perform weighted
ECMP among the other paths (if at least some paths have the bandwidth)
or (c) use a default weight (value chosen is 1) for the paths which
do not have link bandwidth.
The command syntax is
bgp bestpath bandwidth <ignore|skip-missing|default-weight-for-missing>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When announcing ourselves as the next hop (e.g., to EBGP peers), if the
best path has the link bandwidth extended community and it is transitive,
change the value of the link bandwidth to the cumulative downstream
bandwidth (sum of the link bandwidths of all our multipaths) as this
makes the most sense. It is also implied by
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz. Of course, do
not override the link bandwidth if it has been specified by policy.
Note: Transitive extended communities will be automatically passed along
to EBGP peers; this commit is updating the value that is announced to
something that is the most appropriate.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Implement the code to handle the other route-map options to generate
the link bandwidth, namely, to use the cumulative bandwidth or to
base this on the number of multipaths. In the latter case, a reference
bandwidth is internally chosen - the implementation uses a value of
1 Gbps.
These additional options mean that the prefix may need to be advertised
if there is a link bandwidth change, which is a new criteria. Define a
new path (change) flag to support this and implement the advertisement.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
The BGP link bandwidth extended community must not be repeated. If the
attribute already carries this and the route-map specifies a new value,
the implementation will honor the policy configuration and overwrite
the existing values.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Certain extended communities cannot be repeated. An example is the
BGP link bandwidth extended community. Enhance the extended community
add function to ensure uniqueness, if requested.
Note: This commit does not change the lack of uniqueness for any of
the already-supported extended communities. Many of them such as the
BGP route target can obviously be present multiple times. Others like
the Router's MAC should most probably be present only once. The portions
of the code which add these may already be structured such that duplicates
do not arise.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Perform weighted ECMP if the multipaths have link bandwidth. This involves
assigning weights to each of the next hops associated with the prefix based
on the link bandwidth of the corresponding path as a factor of the total
(cumulative) link bandwidth for the prefix. The weight values used are
between 1 and 100. Weights are assigned only if all paths in the multipath
have link bandwidth, otherwise any bandwidths are ignored and regular
ECMP is performed. This is as recommended in
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth
A subsequent commit will implement additional (user-configurable) behaviors.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
During multipath update, track the cumulative link bandwidth
as well as update flags appropriately.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Introduce fields in the multipath structure for link bandwidth handling.
In the process, the mp_count field is changed to a uint16 as that is the
value set anyway.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Additional extended community definitions and display of link-bandwidth
extended community.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Implement route-map option to set the link-bandwidth extended
community. The command is of the form:
set extcommunity bandwidth <(1-26214400)|cumulative|num-multipaths>
[non-transitive]
The options available are to specify the actual bandwidth value in
Mbps, base it on the cumulative downstream bandwidth or base it on
the number of multipaths. The last option is based on
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz. Further,
in alignment with the use case described in this IETF draft, the
extended community is encoded as transitive by default. There is an
option available to specify that it should be non-transitive.
The link-bandwidth itself is carried in bytes per second as specifed in
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth
Note: This commit only handles the processing for bandwidth specifed
as a value; subsequent commits will handle the processing of the other
options.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>