if zebra is not started, then vrf identifiers are not available. This
prevents import/exportation to be available. This commit permits having
import/export available, even when zebra is not started.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The motivation for this patch is to address a concerning behavior of
tx-addpath-bestpath-per-AS. Prior to this patch, all paths' TX ID was
pre-determined as the path was received from a peer. However, this meant
that any time the path selected as best from an AS changed, bgpd had no
choice but to withdraw the previous best path, and advertise the new
best-path under a new TX ID. This could cause significant network
disruption, especially for the subset of prefixes coming from only one
AS that were also communicated over a bestpath-per-AS session.
The patch's general approach is best illustrated by
txaddpath_update_ids. After a bestpath run (required for best-per-AS to
know what will and will not be sent as addpaths) ID numbers will be
stripped from paths that no longer need to be sent, and held in a pool.
Then, paths that will be sent as addpaths and do not already have ID
numbers will allocate new ID numbers, pulling first from that pool.
Finally, anything left in the pool will be returned to the allocator.
In order for this to work, ID numbers had to be split by strategy. The
tx-addpath-All strategy would keep every ID number "in use" constantly,
preventing IDs from being transferred to different paths. Rather than
create two variables for ID, this patch create a more generic array that
will easily enable more addpath strategies to be implemented. The
previously described ID manipulations will happen per addpath strategy,
and will only be run for strategies that are enabled on at least one
peer.
Finally, the ID numbers are allocated from an allocator that tracks per
AFI/SAFI/Addpath Strategy which IDs are in use. Though it would be very
improbable, there was the possibility with the free-running counter
approach for rollover to cause two paths on the same prefix to get
assigned the same TX ID. As remote as the possibility is, we prefer to
not leave it to chance.
This ID re-use method is not perfect. In some cases you could still get
withdraw-then-add behaviors where not strictly necessary. In the case of
bestpath-per-AS this requires one AS to advertise a prefix for the first
time, then a second AS withdraws that prefix, all within the space of an
already pending MRAI timer. In those situations a withdraw-then-add is
more forgivable, and fixing it would probably require a much more
significant effort, as IDs would need to be moved to ADVs instead of
paths.
Signed-off-by Mitchell Skiba <mskiba@amazon.com>
When we have a late registration of the Extended Nexthop capability
for BGP and the peer already has nexthop information stored, go
through and enable RA on the important interfaces.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow some debug notification when we are unable to talk
to zebra due to the connection not being there yet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The peer->group pointer is set only if the PEER_STATUS_GROUP flag is
set in the peer. Add a protection to prevent a NULL pointer dereference.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we add/remove peers we need to do a bit better job
of tracking them in the bgp->peerhash.
1) When we have the doppelganger take over, make sure the
winner is the one represented in the peerhash.
2) When creating the doppelganger, leave the current one
in place instead of blindly replacing it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleanup calls where we were passing in the su for
peer creation a tiny bit.
Creating a peer from the cli will always have a conf_if *or*
a su but not both. While a doppelganger will have both.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
During peer startup there exists the possibility that both
locally and remote peers try to start communication at the
same time. In addition it is possible for local configuration
to change at the same time this is going on. When this happens
try to notice that the remote peer may be in opensent or openconfirm
and if so we need to restart the connection from both sides.
Additionally try to write a bit of extra code in peer_xfer_conn
to notice when this happens and to emit a error message to
the end user about this happening so that it can be cleaned up.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
All I can see is an unneccessary complication. If there's some purpose
here it needs to be documented...
Signed-off-by: David Lamparter <equinox@diac24.net>
Corrections so that the BGP daemon can work with the label manager properly
through a label-manager proxy. Details:
- Correction so the BGP daemon behind a proxy label manager gets the range
correctly (-I added to the BGP daemon, to set the daemon instance id)
- For the BGP case, added an asynchronous label manager connect command so
the labels get recycled in case of a BGP daemon reconnection. With this,
BGPd and LDPd would behave similarly.
Signed-off-by: F. Aragon <paco@voltanet.io>
Problem reported that some bgp and ospf json commands did not return
any json output at all if the bgp/ospf instance did not exist.
Additionally, some bgp and ospf json commands did not return any json
output if the instance existed but no neighbors were defined. This
fix makes these commands more consistent in returning empty braces for
json output and issue a message if not using json output. Additionally,
made the flag "use_json" a bool to make it consistent since previously,
it had been defined as an int, char, u_char, and bool at various places.
Ticket: CM-21040
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
While perusing CONFDATE I noticed that we had a couple
CONFDATE 201805, which we were not picking up( for other
reasons and fixed in a different PR ). But upon investigation
of these I noticed that the commits where in 201805, so these
CONFDATES should be in 2019
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Several zlog_warns were being used to tell the end
user that bgp had detected a bug. These all look like information
added during development that can be noted as debugs or logged
as an error situation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.
Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000 L2 169.253.0.11:9 6646:1000 6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp" /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp" /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#
Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).
Ticket: CM-21566
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a bgp instance is stopped, with a `no router bgp..`
make sure any timers associated with the instance are stopped
as well.
This issue was discovered when a customer issued a `no router bgp`
while a maxmed timer was operative. The max-med timer used the
`struct bgp *` as the passed in value for the thread. The
thread eventually popped after the cleanup and attempted to use
data off in lala land and crashed
Ticket: CM-21895
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit removes various parts of the bgpd implementation code which
are unused/useless, e.g. unused functions, unused variable
initializations, unused structs, ...
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current behavior of the `bgp default shutdown` command is to set the
state of all newly configured peers to shutdown. This leads to a problem
when restarting bgpd, because all peers will then be seen as newly
configured, which leads to all peers being set to shutdown after each
restart.
This behavior is undesired and not common when comparing the
implementation against other vendors. This commit moves the `bgp default
shutdown` configuration underneath the peer-group and peer
configuration, to ensure that existing peers will not be set to shutdown
after a daemon restart.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit finalizes the previous commits which introduced a generic
approach for making all BGP peer and address-family attributes
overrideable by keeping track of the configuration origin in separate
internal structures.
First of all, the test suite was greatly extended to also check the
internal data structures of peer/AF attributes, so that inheritance for
internal values like 'peer->weight' is also being checked in all cases.
This revealed some smaller issues in the implementation, which were also
fixed in this commit. The test suite now fully passes and covers all the
usual situations that should normally occur.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit introduces BGP peer-group overrides for the last set of
peer-level attrs which did not offer that feature yet. The following
attributes have been implemented: description, local-as, password and
update-source.
Each attribute, with the exception of description because it does not
offer any inheritance between peer-groups and peers, is now also setting
a peer-flag instead of just modifying the internal data structures. This
made it possible to also re-use the same implementation for attribute
overrides as already done for peer flags, AF flags and AF attrs.
The `no neighbor <neigh> description` command has been slightly changed
to support negation for no parameters, one parameter or * parameters
(LINE...). This was needed for the test suite to pass and is a small
change without any bigger impact on the CLI.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit implements BGP peer-group overrides for the timer flags,
which control the value of the hold, keepalive, advertisement-interval
and connect connect timers. It was kept separated on purpose as the
whole timer implementation is quite complex and merging this commit
together with with the other flag implementations did not seem right.
Basically three new peer flags were introduced, namely
*PEER_FLAG_ROUTEADV*, *PEER_FLAG_TIMER* and *PEER_FLAG_TIMER_CONNECT*.
The overrides work exactly the same way as they did before, but
introducing these flags made a few conditionals simpler as they no
longer had to compare internal data structures against eachother.
Last but not least, the test suite has been adjusted accordingly to test
the newly implemented flag overrides.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit cleans up some ugly leftovers from previous flag-override
implementation and refactors the AF-flag override implementation to
match the same behavior the newly added peer-flag override
implementation has.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current implementation of the overrides for peer address-family
attributes suffered a bug, which caused all peer-specific attributes to
be lost when the peer was added to a peer-group which already had that
specific address-family active.
This commit extends the *peer_group2peer_config_copy_af* function to
respect overridden flags properly. Additionally, the arguments of the
macros *PEER_ATTR_INHERIT* and *PEER_STR_ATTR_INHERIT* have been
reordered to be more consistent and easy to read.
This commit also adds further test cases to the BGP peer attributes test
suite, so that this kind of error is being caught in future commits. The
missing AF-attribute *distribute-list* has also been added to the test
suite.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current implementation of peer flags (e.g. shutdown, passive, ...)
only has partial support for overriding flags of a peer-group when the
peer is a member. Often settings might get lost if the user toys around
with the peer-group configuration, which can lead to disaster.
This commit introduces the same override implementation which was
previously integrated to support proper peer flag/attribute override on
the address-family level. The code is very similar and the global
attributes now use their separate state-arrays *flags_invert* and
*flags_override*.
The test suite for BGP peer attributes was extended to also check peer
global attributes, so that the newly introduced changes are covered. An
additional feature was added which allows to test an attribute with an
*interface-peer*, which can be configured by running `neighbor IF-TEST
interface`. This was introduced so that the dynamic runtime inversion of
the `extended-nexthop` flag, which is only enabled by default for
interface peers, can also be tested.
Last but not least, two small changes have been made to the current bgpd
implementation:
- The command `strict-capability-match` can now also be set on a
peer-group, it seems like this command slipped through while
implementing peer-groups in the very past.
- The macro `COND_FLAG` was introduced inside lib/zebra.h, which now
allows to either set or unset a flag based on a condition. The syntax
for using this macro is: `COND_FLAG(flag_variable, flag, condition)`
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
Crash w/ an assert if someone calls bgp_delete with a
NULL parameter as opposed to crashing when we dereference
the pointer a bit later.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are determining the state of a peer, we sometimes
detect that we should update the peer->su. The bgp->peer_hash
keeps a hash of peers based upon the peer->su. This requires
us to release the stored value before we re-insert it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleanup the leaked ecommunity data that we may have on shutdown.
Cleanup leaked vrf name strings on shutdown.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit fixes all outstanding style/formatting issues as detected by
'git clang-format' or 'checkpath' for the new peer-group override
implementation, which spanned across several commits.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The previous commit introduced very strict unit tests which check all
three involved components (config input, config output, internal data
structures) which revealed two more bugs in the peer-group override
implementation.
This commit fixes overrides for 'allowas-in <number>' and
'unsuppress-map', which both had a small mistake/typo causing those
issues.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit fixes peer-group overrides for inverted AF flags. This
implementation is currently only being used by the three 'send-community'
flags. Commit 70ee29b4d introduced generic support for overriding AF
flags, but did not support inverted flags.
By introducing an additional array on the BGP peer structure called
'af_flags_invert' all current and future flags which should work in an
inverted way can now also be properly overridden.
The CLI commands will work exactly the same way as before, just that 'no
<command>' now sets the flag and override whereas '<command>' will unset
the flag and remove the override.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This commit adds the same peer-group override capabilites as d122d7cf7
for all filter/map options that can be enabled/disabled on each
address-family of a BGP peer.
All currently existing filter/map options are being supported:
filter-list, distribute-list, prefix-list, route-map and unsuppress-map
To implement this behavior, a new peer attribute 'filter_override' has
been added together with various PEER_FT_ (filter type) constants for
tracking the state of each filter in the same way as it is being done
with 'af_flags_override'.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
The current implementation for overriding peer-group configuration on a
peer member consists of several bandaids, which introduce more issues
than they fix. A generic approach for implementing peer-group overrides
for address-family flags is clearly missing.
This commit implements a generic and sane approach to overriding
peer-group configuration on a peer-member. A separate peer attribute
called 'af_flags_override' which was introduced in 04e1c5b is being used
to keep track of all address-family flags, storing whether the
configuration is being inherited from the parent-group or overridden.
All address-family flags are being supported by this implementation
(note: flags, not filters/maps) except 'send-community', which currently
breaks due to having the three flags enabled by default, which is not
being properly handled within this commit; all flags are supposed to
have an 'off'/'false' state by default.
In the interest of readability and comprehensibility, the flag
'send-community' is being fixed in a separate commit.
The following rules apply when looking at the new peer-group override
implementation this commit provides:
- Each peer-group can enable every flag (except the limitations noted
above), which gets automatically inherited to all members.
- Each peer can enable each flag independently and/or modify their
value, if available. (e.g.: weight <value>)
- Each command executed on a neighbor/peer gets explicitely set as an
override, so even when the peer-group has the same kind of
configuration, both will show up in 'show running-configuration'.
- Executing 'no <command>' on a peer will remove the peer-specific
configuration and make the peer inherit the configuration from the
peer-group again.
- Executing 'no <command>' on a peer-group will only remove the flag
from the peer-group, however not from peers explicitely setting that
flag.
This guarantees a clean implementation which does not break, even when
constantly messing with the flags of a peer-group. The same behavior is
present in Cisco devices, so people familiar with those should feel safe
when dealing with FRRs peer-groups.
The only restriction that now applies is that single peer cannot
disable a flag which was set by a peer-group, because 'no <command>' is
already being used for disabling a peer-specific override. This is not
supported by any known vendor though, would require many specific
edge-cases and magic comparisons and will most likely only end up
confusing the user. Additionally, peer-groups should only contain flags
which are being used by all peer members.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
Sometimes at startup, BGP Flowspec may be allocated a routing table
identifier not in the range of the predefined table range.
This issue is due to the fact that BGP peering goes up, while the BGP
did not yet retrieve the Table Range allocator.
The fix is done so that BGP PBR entries are not installed while
routing table identifier range is not obtained. Once the routing table
identifier is obtained, parse the FS entries and check that all selected
entries are installed, and if not, install it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
policy routing is configurable via address-family ipv4 flowspec
subfamily node. This is then possible to restrict flowspec operation
through the BGP instance, to a single or some interfaces, but not all.
Two commands available:
[no] local-install [IFNAME]
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit moves the command 'bgp enforce-first-as' from global BGP
instance configuration to peer/neighbor configuration, which can now be
changed by executing '[no] neighbor <neighbor> enforce-first-as'.
End users can now enforce sane first-AS checking on regular sessions
while e.g. disabling the checks on routeserver sessions, which usually
strip away their own AS number from the path.
To ensure backwards-compatibility, a migration routine was added which
automatically sets the 'enforce-first-as' flag on all configured
neighbors if the old global setting was activated. The old global
command immediately disappears after running the migration routine once.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
This flag needs to be set by default for l2vpn evpn address-family.
We needed to find a place in the code which gets called by all peers
at somepoint in the statemachine and before the routes are advertised.
peer_new seems like the right place for this
as we are setting other default af_flags here as well.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
In the FRR implementation of EVPN,
eBGP leaf-spine peering for EVPN is fully supported by allowing
the next hop to be propagated and not rewritten at each hop.
There are other changes also related to route import to facilitate this.
However, propagating the next hop is not correct in some cases.
Specifically, if the DC is comprised of multiple PODs
with distinct intra-POD and inter-POD VxLAN tunnels,
EVPN routes received from an adjacent POD by a border/exit leaf
must be propagated into the local POD with the next hop rewritten (to self).
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Attribute set on peer was being overridden when set on the peer-group.
This commit also adds a parallel flags array that indicates whether a
particular flag is sourced from the peer-group or is peer-specific. It
assumes the default state of all flags is unset. This looks to be true
except in the case of PEER_FLAG_SEND_COMMUNITY,
PEER_FLAG_SEND_EXT_COMMUNITY, and PEER_FLAG_SEND_LARGE_COMMUNITY; these
flags are set by default except when the user specifies to use
config-type = cisco. However the flag field can merely be flipped to
mean the negation of those options in a future commit.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Upon BGP destroy, the hash list related to PBR are removed.
The pbr_match entries, as well as the contained pbr_match_entries
entries.
Then the pbr_action entries. The order is important, since the former
are referencing pbr_action. So the references must be removed, prior to
remove pbr action.
Also, the zebra associated contexts are removed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
bgp structure is being extended with hash sets that will be used by
flowspec to give policy routing facilities.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Setup a per-VRF identifier to use along with the Router Id to build the
RD. Define a function to encode the RD. Code is brought over from EVPN
and EVPN code has been modified to use the generic function.
Ticket: CM-20256
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
add the `import vrf XXXX` command
router bgp 4 vrf DONNA
<config>
!
router bgp 4 vrf EVA
<config>
address-family ipv4 uni
import vrf DONNA
!
!
This command will allow for vrf EVA to specify that it would like
to receive the routes from vrf DONNA into it's table.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add support for CLI "auto" keyword in vrf->vpn export label:
router bgp NNN vrf FOO
address-family ipv4 unicast
label vpn export auto
exit-address-family
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
MPLS label pool backed by allocations from the zebra label manager.
A caller requests a label (e.g., in support of an "auto" label
specification in the CLI) via lp_get(), supplying a unique ID and
a callback function. The callback function is invoked at a later
time with the unique ID and a label value to inform the requestor
of the assigned label.
Requestors may release their labels back to the pool via lp_release().
The label pool is stocked with labels allocated by the zebra label
manager. The interaction with zebra is asynchronous so that bgpd
is not blocked while awaiting a label allocation from zebra.
The label pool implementation allows for bgpd operation before (or
without) zebra, and gracefully handles loss and reconnection of
zebra. Of course, before initial connection with zebra, no labels
are assigned to requestors. If the zebra connection is lost and
regained, callbacks to requestors will invalidate old assignments
and then assign new labels.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Routes that have labels must be sent via a nexthop that also has labels.
This change notes whether any path in a nexthop update from zebra contains
labels. If so, then the nexthop is valid for routes that have labels.
If a nexthop update has no labeled paths, then any labeled routes
referencing the nexthop are marked not valid.
Add a route flag BGP_INFO_ANNC_NH_SELF that means "advertise myself
as nexthop when announcing" so that we can track our notion of the
nexthop without revealing it to peers.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
This work is derived from a work done by China-Telecom.
That initial work can be found in [0].
As the gap between frr and quagga is important, a reworks has been
done in the meantime.
The initial work consists of bringing the following:
- Bringing the client side of flowspec.
- the enhancement of address-family ipv4/ipv6 flowspec
- partial data path handling at reception has been prepared
- the support for ipv4 flowspec or ipv6 flowspec in BGP open messages,
and the internals of BGP has been done.
- the memory contexts necessary for flowspec has been provisioned
In addition to this work, the following has been done:
- the complement of adaptation for FS safi in bgp code
- the code checkstyle has been reworked so as to match frr checkstyle
- the processing of IPv6 FS NLRI is prevented
- the processing of FS NLRI is stopped ( temporary)
[0] https://github.com/chinatelecom-sdn-group/quagga_flowspec/
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: jaydom <chinatelecom-sdn-group@github.com>
In BGP, doing policy-routing requires to use table identifiers.
Flowspec protocol will need to have that. 1 API from bgp zebra has been
done to get the table chunk.
Internally, onec flowspec is enabled, the BGP engine will try to
connect smoothly to the table manager. If zebra is not connected, it
will try to connect 10 seconds later. If zebra is connected, and it is
success, then a polling mechanism each 60 seconds is put in place. All
the internal mechanism has no impact on the BGP process.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This commit is relying on bgp vpn-policy. It is needed to configure
several bgp vrf instances, and in each of the bgp instance, configure
the following command under address-family ipv4 unicast node:
[no] rt redirect import RTLIST
Then, a function is provided, that will parse the BGP instances.
The incoming ecommunity will be compared with the configured rt redirect
import ecommunity list, and return the VRF first instance of the matching
route target.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Previous patches to suppress display of automatically calculated
coalesce-time did not fully work because the flag indicating whether the
value was automatically calculated was not set properly upon creation.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
PR #1739 added code to leak routes between (default VRF) VPN safi and unicast RIBs in any VRF. That set of changes included temporary CLI including vpn-policy blocks to specify RD/RT/label/&c. After considerable discussion, we arrived at a consensus CLI shown below.
The code of this PR implements the vpn-specific parts of this syntax:
router bgp <as> [vrf <FOO>]
address-family <afi> unicast
rd (vpn|evpn) export (AS:NN | IP:nn)
label (vpn|evpn) export (0..1048575)
rt (vpn|evpn) (import|export|both) RTLIST...
nexthop vpn (import|export) (A.B.C.D | X:X::X:X)
route-map (vpn|evpn|vrf NAME) (import|export) MAP
[no] import|export [vpn|evpn|evpn8]
[no] import|export vrf NAME
User documentation of the vpn-specific parts of the above syntax is in PR #1937
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
- add "debug bgp vpn label" CLI
- improved debug messages for "debug bgp bestpath"
- send vrf label to zebra after zebra informs bgpd of vrf_id
- withdraw vrf_label from zebra if zebra informs bgpd that vrf_id is disabled
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
The work_queue_free function free'd up the wq pointer but
did not set it too NULL. This of course causes situations
where we may use the work_queue after it is freed. Let's
modify the work_queue to set the pointer for you.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When VRF is not yet available at startup, the check for main socket
presence must be done. As the main socket creation is made in a separate
place from vrf socket for netns, ths main socket creation must not be
prevented when a BGP VRF relies on vrf lite mechanism.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Upon creation of BGP instances, server socket may or may not be created.
In the case of VRF instances, if the VRF backend relies on NETNS, then
a new server socket will be created for each BGP VRF instance. If the
VRF backend relies on VRF LITE, then only one server socket will be
enough. Moreover, At startup, with BGP VRF configuration, a server
socket may not be created if VRF is not the default one or VRF is not
recognized yet.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The change contained in this commit does the following:
- discovery of vrf id from zebra daemon, and adaptation of bgp contexts
with BGP.
The list of network addresses contain a reference to the bgp context
supporting the vrf.
The bgp context contains a vrf pointer that gives information about
the netns path in case the vrf is a netns path.
Only some contexts are impacted, namely socket creation, and retrieval
of local IP settings. ( this requires vrf identifier).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
peer->ifindex was only used in two places but it was never populated so
neither of them worked as they should. 'struct peer' also has a 'struct
interface' pointer which we can use to get the ifindex.
Use the new threading facilities provided in lib/ to streamline the
threads used in bgpd. In particular, all of the lifecycle code has been
removed from the I/O thread and replaced with the default loop. Did not
do the same to the keepalives thread as it is much smaller (doesn't need
the event system).
Also cleaned up some comments to match the style guide.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Adds ability to specify that peers should be administratively shutdown
when first configured.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When a peer configured with administrative shutdown is added to a peer
group, the administrative shutdown status is discarded and the peer will
enter the BGP FSM. This is not what we want. Preserve the flag instead.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The constant to limit # of allowed cli tokens on any one line was
defined in multiple places, all inconsistent with each other. Fix.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The BGP IO thread must be running before other threads
can start using it. So at startup check to see
that it running once, instead of before every
function call into.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The multithreading code has a comment that reads:
"XXX: Heavy abuse of stream API. This needs a ring buffer."
This patch makes the relevant code use a ring buffer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Was using 0 as a sentinel value, so user couldn't configure 0 as the
value of the coalesce timer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
currently, we have a rd_id bitfield
to assign an unique index for auto RD.
This bitfield currently resides under struct bgp which seems wrong.
We need to shift this to a global space
as this ID space is really global per box.
One more reason to keep it at a global data structure is,
the ID space could be used by both VNIs and VRFs.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
BGP VRF can be created/deleted either via config or via l3vni add/del.
We need to handle various sequences.
1. If user config is presented, an l3vni del should not delete the vrf instance
2. do not write bgp config in show running for auto created vrf
2. If l3vni present, disallow the cli for deleting bgp vrf instance
3. If l3vni is added and vrf config is present set the flags properly
4. if bgp vrf is configured unset the AUTO flag
Ticket: CM-18630
Review: CCR-6906
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Since coalesce time is now heuristically adjusted based on peer count,
we need to separate out specific configuration by the user from the
current value. Behavior established is to not adjust if the user has a
value set.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If we are in OpenSent or OpenConfirm peer state and we receive a new
address-family activation, we would end up ignoring the new activation
and not tell our peer about it. You could notice this by seeing
the fact that a 'show bgp neighbor' command returns a 'Not in
any update group' for a particular family.
This modifies the code such that we now notice that we are in
either OpenSent or OpenConfirm state and reset the peer to
allow us to send them the new capability.
Ticket: CM-19021
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The subgroup coalesce timer controls how long updates to a particular
subgroup are delayed in order to allow additional peers to join the
subgroup. Presently the timer value is 200 ms. Increase it to 1 second
and adjust up as peers are configured, with an upper cap at 10s.
This cuts convergence time by a factor of 3 at large scale (300+ peers,
1000+ prefixes per peer).
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
bgpd supports setting a write-quanta that serves as a hint on how many
packets to write per I/O cycle. Now that input is buffered, it makes
sense to add the equivalent parameter for how many packets are processed
per cycle. This is *not* how many packets are read off the wire per I/O
cycle; rather it is how many packets are processed from the input buffer
in a given cycle after having been read off the wire and sanitized.
Since these values must be used from multiple threads, they have also
been made atomic.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Instead of reading a packet header and the rest of the packet in two
separate i/o cycles, instead read a chunk of data at one time and then
parse as many packets as possible out of the chunk.
Also changes bgp_packet.c to batch process packets.
To avoid thrashing on useless mutex locks, the scheduling call for
bgp_process_packet has been changed to always succeed at the cost of no
longer being cancel-able. In this case this is acceptable; following the
pattern of other event-based callbacks, an additional check in
bgp_process_packet to ignore stray events is sufficient. Before deleting
the peer all events are cleared which provides the requisite ordering.
XXX: chunk hardcoded to 5, should use something similar to wpkt_quanta
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Move and modify all network input related code to bgp_io.c
* Add a real input buffer to `struct peer`
* Move connection initialization to its own thread.c task instead of
piggybacking off of bgp_read()
* Tons of little fixups
Primary changes are in bgp_packet.[ch], bgp_io.[ch], bgp_fsm.[ch].
Changes made elsewhere are almost exclusively refactoring peer->ibuf to
peer->curr since peer->ibuf is now the true FIFO packet input buffer
while peer->curr represents the packet currently being processed by the
main pthread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
After implement threading, bgp_packet.c was serving the double purpose
of consolidating packet parsing functionality and handling actual I/O
operations. This is somewhat messy and difficult to understand. I've
thus moved all code and data structures for handling threaded packet
writes to bgp_io.[ch].
Although bgp_io.[ch] only handles writes at the moment to keep the noise
on this commit series down, for organization purposes, it's probably
best to move bgp_read() and its trappings into here as well and
restructure that code so that read()'s happen in the pthread and packet
processing happens on the main thread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Changes all synchronization primitives to be dynamically allocated. This
should help catch any subtle errors in pthread lifecycles.
This change also pre-initializes synchronization primitives before
threads begin to run, eliminating a potential race condition that
probably would have caused a segfault on startup on a very fast box.
Also changes mutex and condition variable allocations to use
MTYPE_PTHREAD and updates tests to do the proper initializations.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Removes the WiP shim and implements proper thread lifecycle management.
* Declare necessary pthread_t's in bgp_master
* Define new MTYPE in lib/thread.c for pthreads
* Allocate and free BGP's pthreads appropriately
* Terminate and join threads appropriately
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Problem reported that we weren't adjusting the keepalive timer
correctly when we negotiated a lower hold time learned from a
peer. While working on this, found we didn't do inheritance
correctly at all. This fix solves the first problem and also
ensures that the timers are configured correctly based on this
priority order - peer defined > peer-group defined > global config.
This fix also displays the timers as "configured" regardless of
which of the three locations above is used.
Ticket: CM-18408
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-6807
Testing-performed: Manual testing successful, fix tested by
submitter, bgp-smoke completed successfully
This improves code readability and also future-proofs our codebase
against new changes in the data structure used to store interfaces.
The FOR_ALL_INTERFACES_ADDRESSES macro was also moved to lib/ but
for now only babeld is using it.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This is an important optimization for users running FRR on systems with
a large number of interfaces (e.g. thousands of tunnels). Red-black
trees scale much better than sorted linked-lists and also store the
elements in an ordered way (contrary to hash tables).
This is a big patch but the interesting bits are all in lib/if.[ch].
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
list_free is occassionally being used to delete the
list and accidently not deleting all the nodes.
We keep running across this usage pattern. Let's
remove the temptation and only allow list_delete
to handle list deletion.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Convert the list_delete(struct list *) function to use
struct list **. This is to allow the list pointer to be nulled.
I keep running into uses of this list_delete function where we
forget to set the returned pointer to NULL and attempt to use
it and then experience a crash, usually after the developer
has long since left the building.
Let's make the api explicit in it setting the list pointer
to null.
Cynical Prediction: This code will expose a attempt
to use the NULL'ed list pointer in some obscure bit
of code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we are configuring a peer in multiple address families
and assigning the peer group valid configuration. If you
delay the non-automatically address family activation you
will not copy the peer group data into that peer.
Suppose we enter this:
router bgp 65001
bgp router-id 6.0.0.17
neighbor ISL peer-group
neighbor ISL advertisement-interval 0
neighbor ISL timers connect 5
neighbor ISL timers 3 10
address-family ipv4 unicast
neighbor ISL allowas-in 1
neighbor swp31s0 interface
neighbor swp31s0 peer-group ISL
address-family ipv6 unicast
neighbor ISL allowas-in 1
We've assigned allowas-in to the ISL peer group. Now suppose
we have a peer start connection to swp31s0. We startup and
auto copy the v4 peer group information onto the peer. We
do not copy the v6 peer group information because it has
not started yet.
Now at a later time if we enter:
address-family ipv6 unicast
neighbor ISL activate
We start the swp31s0 peer in v6, but we are not copying the
peer group data into the v6 swp31s0 peer data structure. As
such we are not respecting the v6 peer group config.
This Change modifies and renames the non_peergroup_activate_af
function to peer_activate_af. We also call the function
peer_group2peer_config_copy_af if the peer is part of a peer
group.
The static function peer_group2peer_config_copy_af I have moved
to higher up so we don't have to add a static function declaration
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This fixes the broken indentation of several foreach loops throughout
the code.
From clang's documentation[1]:
ForEachMacros: A vector of macros that should be interpreted as foreach
loops instead of as function calls.
[1] http://clang.llvm.org/docs/ClangFormatStyleOptions.html
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
1) Add hash names to all hash_create calls
2) Fix community_hash, ecommunity_hash and lcommunity_hash key
creation
3) Fix output of community and lcommunity iterators( why would
we want to see the memory location of the backet? ).
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
afi_header_vty_out() is easily replaced with vty_frame(), which means we
can drop a whole batch of "int *write" args as well as the entirety of
bgp_config_write_family_header().
=> AFI/SAFI config writing is now a lot simpler.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
1. Change hostname_get to cmd_hostname_get
2. Change domainname_get to cmd_domainname_get
3. New API to set domainname
3. Provide a CLI command to set domainname
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Bug introduced by commit 37d361e7. Removing the call to bgp_close()
from bgp_delete() was a mistake.
Reported-by: Don Slice <dslice@cumulusnetworks.com>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When the underlying VRF is deleted, ensure that state for the
next hops that BGP registers with zebra for tracking purposes is
properly updated. Otherwise BGP will not re-register the next hop
when the VRF is re-created, resulting in the next hop staying
unresolved.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-17456
Reviewed By: CCR-6587
Testing Done: Manual, bgp-min, vrf
There are two parts to this commit:
1. create a database of self tunnel-ip for used in martian nexthop check
In a CLAG setup, the tunnel-ip (VNI UP) notification comes before the clag-anycast-ip comes up in the system.
This was causing our self next hop check to fail and we were instaling routes with martian nexthop in zebra.
We need to keep this info in a seperate database for all local tunnel-ip.
This database will be used in parallel with the self next hop database to martian nexthop checks.
2. When a local VNI comes up, update the tunnel-ip database and filter routes in the RD table if necessary
In case of EVPN we might receive routes from clag peer before the clag-anycast ip and VNI is up on the system.
We will store the routes in the RD table for later processing.
When VNI comes UP, we loop thorugh all the routes and install them in zebra if required.
However, we were missing the martian nexthop check in this code path.
From now onwards, when a VNI comes UP,
we will first update the tunnel-ip database
We then loop through all the routes in RD table and apply martian next hop filter if required.
Things not covered in this commit but are required:
This processing is needed in general when an address becomes a connected address.
We need to loop through all the routes in BGP and apply martian nexthop filter if necessary.
This will be taken care in a seperate bug
Ticket:CM-17271/CM-16911
Reviewed By: ccr-6542
Testing Done: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When displaying the config, bgpd only checked for the existance of a peer-group prefix-list before
deciding to not display the outbound prefix-list. This commit updates the outbound prefix-list
logic to match the inbound.
afi_header_vty_out is sidestepping the vty code, writing straight to the
output (either stdout or the obuf), which results in newline translation
not being performed.
Easiest fix is replacing it with a macro. Longer-term, I have some old
code to add "prefaces" to the vty output, planning to dig that up.
Fixes: #949 ("bgpd show running doesn't show new lines")
Reported-by: Lou Berger <lberger@labn.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The defines:
ONE_DAY_SECOND
ONE_WEEK_SECOND
ONE_YEAR_SECOND
were being defined all over the system, move the
define to a central location.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
SAFI values have been a major source of confusion over the last few
years. That's because each SAFI needs to be represented in two different
ways:
* IANA's value used to send/receive packets over the network;
* Internal value used for array indexing.
In the second case, defining reserved values makes no sense because we
don't want to index SAFIs that simply don't exist. The sole purpose of
the internal SAFI values is to remove the gaps we have among the IANA
values, which would represent wasted memory in C arrays. With that said,
remove these reserved SAFIs to avoid further confusion in the future.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
swpX peers all start out with the same sockunion so initially they all
go into the same hash bucket. Once IPv6 ND has worked its magic they
will have different sockunions and will go in different buckets...life
is good.
Until then though, we are in a phase where all swpX peers have the same
socknunion. Once we have HASH_THRESHOLD (10) swpX peers and call
hash_get for a new swpX peer the hash code calls hash_expand(). This
happens because there are more than HASH_THRESHOLD entries in a single
bucket so the logic is "expand the hash to spread things out"...in our
case expanding doesn't spread out the swpX peers because all of their
sockunions are the same.
I looked at having peer_hash_make and peer_hash_same consider the ifname
of the swpX peer but that is a large change that we don't want to make
at the moment. So the fix is to put a cap on how large we are
willing to let the hash table get. By default there is no limit but if
max_size is set we will not allow the hash to expand above that.
This reverts commit c14777c6bf.
clang 5 is not widely available enough for people to indent with. This
is particularly problematic when rebasing/adjusting branches.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
This allows frr-reload.py (or anything else that scripts via vtysh)
to know if the vtysh command worked or hit an error.
When the BGP router-id changes, EVPN routes need to be processed due
to potential change to their RD.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Implement configuration options for EVPN. The configuration options include
VNI configuration with RD and Import and Export Route Targets. Also, display
the EVPN configuration.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Define the EVPN (EVI) hash table and related structures and initialize
and cleanup.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>