as per [0], ipv6 adress format introduces an ipv6 offset that needs to
be extracted too. The change include the validation, decoding for
further usage with policy-routing and decoding for dumping.
[0] https://tools.ietf.org/html/draft-ietf-idr-flow-spec-v6-09
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
until now, the assumption was done in bgp flowspec code that the
information contained was an ipv4 flowspec prefix. now that it is
possible to handle ipv4 or ipv6 flowspec prefixes, that information is
stored in prefix_flowspec attribute. Also, some unlocking is done in
order to process ipv4 and ipv6 flowspec entries.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Connect for X succeeds and hence moved from idle ->connect with
FD-x.
4. A incoming connection is accepted and a new peer datastructure Y
is created with FD-y moves from idle->Active state.
5. Peer datastercture Y FD-y sends out OPEN and moves to
Active->Opensent state.
6. Peer datastrcture Y FD-y receives OPEN and moved from Opensent->
Openconfirm state.
7. Meanwhile on peer datastrcture X FD-x sends out a OPEN message
and moved from connect->Opensent.
8. For peer datastrcture Y FD-y keep alive is received and it is
moved from OpenConfirm->Established.
9. In this case peer datastructure Y FD-y is a accepted connection
so we try to copy all its parameter to peer datastructure X and
delete Y.
10. During this process TCP connection for the accepted connection
(FD-y) goes down and hence get remote address and port fails.
11. With this failure bgp_stop function for both peer datastrure X
and peer datastructure Y is called.
12. By this time all the parameters include state for datastrcture
for X and Y are exchanged. Peer Y FD-y when it entered this
function had state OpenConfirm still which has been moved to peer
datastrcture X.
13. In bgp_stop it will stop all the timers and take action only if
peer is in established state. Now that peer datastrcture X and Y
are not in established state (in this function) it will simply
close all timers and close the socket and assigns socket for both
the peer datastrcture to -1.
14. Peer datastrcture Y will be deleted as it is a datastrcture created
due to accept of connection where as peer datastrcture X will be held
as it is created with configuration.
15. Now peer datastrcture X now holds a state of OpenConfirm without any
timers running.
16. With this any new incoming connection will never be able to establish
as there is config connection X which is stuck in OpenConfirm.
Fix:
While transferring the peer datastructure Y FD-y (accepted connection)
to the peer datastructure X, if TCP connection for FD-y goes down, then
1. Call fsm event bgp_stop for X (do cleanup with bgp_stop and move the
state to Idle) and
2. Call fsm event bgp_stop for Y (do cleanup with bgp_stop and gets deleted
since it is an accept connection).
Signed-off-by: Sarita Patra <saritap@vmware.com>
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Peer datastrcture Y FD-X receives OPEN and moved from Opensent->
Openconfirm state and start the hold timer.
4. In the OpenConfirm state, the hold timer is stopped. So peer X
waits for Keepalive message from peer. If the Keepalive message
is not received, then it will be in OpenConfirm state for
indefinite time.
5. Due to this it neither close the existing connection nor it will
accept any connection from peer.
Fix:
In the OpenConfirm state, don't stop the hold timer.
1. Upon receipt of a neighbor’s Keepalive, the state is moved to
Established.
2. But If the hold timer expires, a stop event occurs, the state
is moved to Idle.
This is as per RFC.
Signed-off-by: Sarita Patra <saritap@vmware.com>
* Applied style suggestions by automated compliance check.
* Fixed function bgp_shutdown_enable to use immutable message string.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
When iterating over a `show ip bgp vrf all neighbors json` command
bgp is crashing.
The json variable was being double freed. When freeing it, set it
to NULL and then check to make sure it exists before we free.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The RFC states:
The BGP Identifier is a 4-octet, unsigned, non-zero integer that
should be unique within an AS. The value of the BGP Identifier
for a BGP speaker is determined on startup and is the same for
every local interface and every BGP peer.
We were going slightly beyond this and ensuring that the address
was a specific range of addresses which is no longer relevant.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Replaced alias for bgp shutdown command with separate regular command
to prevent internal CLI errors.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Peers are now automatically restarted by the reconnect timer instead
of a ManualStart event after lifting the administrative shutdown.
* Question of when to log what remains.
* Compiles and works as intended now.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Fixed integration in FSM and packet handling.
* Added CLI "show" output, incl. JSON.
* For review and testing only.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
* Changes allow administratively shutting down all peers of a BGP
instance.
* New CLI commands "[no] bgp shutdown" in vty shell.
* For review and testing only.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
This would be handy for situations when a notification was sent, but it's
absolutely not clear who triggered that.
Just in case dumping all attributes under the debug mode would help finding
the _bad_ attribute.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
* Removed old timer thread resets, since this has been taken care of
after execution of the threads by the thread_fetch function in
lib/thread.c for quite some time now.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Currently, bgpPeerTable only looks the default BGP instance. Most
vendors return all the available peers in this table. This commit
exposes all BGP instances.
The other tables are unchanged as it doesn't make sense to expose
routes from random VRFs into a single table. Vendors are using SNMP
contexts for that but we don't have support for it. Therefore, do
nothing.
Fix#6077
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
SYNC routes are paths rxed from a local-ES peer. These routes result in
the installation of local dataplane entries i.e. with access port as
destination (vs. the remote-VTEP destination that results in the packet
being sent via the VxLAN overlay).
If a SYNC path is selected as the best path it is always turned around
into a local path which immediately lowers the status of the SYNC path
to non-best. However we need to keep track of the highest MM seq-number
and peer activity to continue advertising the local path. In order to
do that we need information from the "second-best" SYNC path to be
bubbled up to the local best path. This "SYNC" info is then consolidated
and sent to zebra which is responsible for the MM handling and local
path management.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a SYNC route i.e. a route with a local ES as destination is
rxed on a switch (say L11) from an ES peer (say L12) a local
MAC/neigh entry is created on L11 with the local access port
as dest port.
Creation of the local entry triggers a local path advertisement from
L11. This could be a "locally-active" path or a "locally-inactive"
path. Inactive paths are advertised with the proxy bit.
To ensure that the local entry is not deleted by a SYNC route it is
given absolute precedence over peer-paths.
If there are two non-local paths with the same dest ES and same MM
seq number the non-proxy path is preferred. This is done to ensure
that we don't lose track of the peer-activity.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A new proxy flag has been added to the already existing NA extended
community to allow proxy advertisment of a local host by a VTEP that is
yet to indpendently establish local reachability.
Reference: draft-rbickhart-evpn-ip-mac-proxy-adv
The extendend mac-mobility sequence number needs to be synced across
the ES peers. However we cannot let a ES-peer path win over a local
path on the same ES. To accomplish that some parameters such as the
MM seq number are bubbled up from the non-best path to the local path.
This mechanism is explained further in the path-selection patch.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The `struct evpn_ead_addr` structure had a prefix length
associated with it. This value was only ever set never
used. Remove this from our system. The other
nice thing about this change is that it puts back
the sizeof struct route_node to 192 bytes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. Sample ES display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es
ES Flags: L local, R remote, I inconsistent
VTEP Flags: E ESR/Type-4, A active nexthop
ESI Flags RD #VNIs VTEPs
03:00:00:00:00:01:11:00:00:01 LR 27.0.0.15:15 10 27.0.0.16(EA)
03:00:00:00:00:01:22:00:00:02 LR 27.0.0.15:16 10 27.0.0.16(EA)
03:00:00:00:00:01:22:00:00:03 LR 27.0.0.15:17 10 27.0.0.16(EA)
03:00:00:00:00:02:11:00:00:01 R - 10 27.0.0.17(A),27.0.0.18(A)
03:00:00:00:00:02:22:00:00:02 R - 10 27.0.0.17(A),27.0.0.18(A)
03:00:00:00:00:02:22:00:00:03 R - 10 27.0.0.17(A),27.0.0.18(A)
torm-11#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2. Sample ES-EVI display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es-evi
Flags: L local, R remote, I inconsistent
VTEP-Flags: E EAD-per-ES, V EAD-per-EVI
VNI ESI Flags VTEPs
1005 03:00:00:00:00:01:11:00:00:01 LR 27.0.0.16(EV)
1005 03:00:00:00:00:01:22:00:00:02 LR 27.0.0.16(EV)
1005 03:00:00:00:00:01:22:00:00:03 LR 27.0.0.16(EV)
1005 03:00:00:00:00:02:11:00:00:01 R 27.0.0.17(EV),27.0.0.18(EV)
1005 03:00:00:00:00:02:22:00:00:02 R 27.0.0.17(EV),27.0.0.18(EV)
1005 03:00:00:00:00:02:22:00:00:03 R 27.0.0.17(EV),27.0.0.18(EV)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3. Sample EAD route display
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn route type ead
BGP table version is 19, local router ID is 27.0.0.15
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [4]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Extended Community
Route Distinguisher: 27.0.0.15:5
*> [1]:[0]:[03:00:00:00:00:01:11:00:00:01]:[128]:[0.0.0.0]
27.0.0.15 32768 i
ET:8 RT:5550:1009
*> [1]:[0]:[03:00:00:00:00:01:22:00:00:02]:[128]:[0.0.0.0]
27.0.0.15 32768 i
ET:8 RT:5550:1009
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is the base patch that brings in support for Type-1 routes.
It includes support for -
- Ethernet Segment (ES) management
- EAD route handling
- MAC-IP (Type-2) routes with a non-zero ESI i.e. Aliasing for
active-active multihoming
- Initial infra for consistency checking. Consistency checking
is a fundamental feature for active-active solutions like MLAG.
We will try to levarage the info in the EAD-ES/EAD-EVI routes to
detect inconsitencies in access config across VTEPs attached to
the same Ethernet Segment.
Functionality Overview -
========================
1. Ethernet segments are created in zebra and associated with
access VLANs. zebra sends that info as ES and ES-EVI objects to BGP.
2. BGP advertises EAD-ES and EAD-EVI routes for the locally attached
ethernet segments.
3. Similarly BGP processes EAD-ES and EAD-EVI routes from peers
and translates them into ES-VTEP objects which are then sent to zebra
as remote ESs.
4. Each ES in zebra is associated with a list of active VTEPs which
is then translated into a L2-NHG (nexthop group). This is the ES
"Alias" entry
5. MAC-IP routes with a non-zero ESI use the alias entry created in
(4.) to forward traffic i.e. a MAC-ECMP is done to these remote-ES
destinations.
EAD route management (route table and key) -
============================================
1. Local EAD-ES routes
a. route-table: per-ES route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
b. route-table: per-VNI route-table
Not added
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
2. Remote EAD-ES routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
3. Local EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
4. Remote EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
Please refer to bgp_evpn_mh.h for info on how the data-structures are
organized.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Add ESI as an inline attribute field along with the other EVPN
attributes. This may be re-worked when the rest of the EVPN
attributes find a new home.
Some cleanup has been done to get rid of stale/unused references
to ESI. And also to consolidate duplicate definitions of ES ID
types.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. EAD routes require support for ESI_LABEL extended community. The
primary info in this EC is a flags the specifies if the ES is
Single-active or active-acive.
2. Also fixed up ES_IMPORT_RT string. Support was added a long time
ago for ESR/Type-4 routes but it has not really been exercised for
MH functionality till now.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Re-org only; no other code changes. This is being done to make maintanence
of MH functionality (which will have more code added to it) easy.
The code moved here was originally committed via -
'commit 50f74cf131 ("*: support for evpn type-4 route")'
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Revert "zebra: support for macvlan interfaces"
This reverts commit bf69e212fd.
Revert "doc: add some documentation about bgp evpn netns support"
This reverts commit 89b97c33d7.
Revert "zebra: dynamically detect vxlan link interfaces in other netns"
This reverts commit de0ebb2540.
Revert "bgpd: sanity check when updating nexthop from bgp to zebra"
This reverts commit ee9633ed87.
Revert "lib, zebra: reuse and adapt ns_list walk functionality"
This reverts commit c4d466c830.
Revert "zebra: local mac entries populated in correct netnamespace"
This reverts commit 4042454891.
Revert "zebra: when parsing local entry against dad, retrieve config"
This reverts commit 3acc394bc5.
Revert "bgpd: evpn nexthop can be changed by default"
This reverts commit a2342a2412.
Revert "zebra: zvni_map_to_vlan() adaptation for all namespaces"
This reverts commit db81d18647.
Revert "zebra: add ns_id attribute to mac structure"
This reverts commit 388d5b438e.
Revert "zebra: bridge layer2 information records ns_id where bridge is"
This reverts commit b5b453a2d6.
Revert "zebra, lib: new API to get absolute netns val from relative netns val"
This reverts commit b6ebab34f6.
Revert "zebra, lib: store relative default ns id in each namespace"
This reverts commit 9d3555e06c.
Revert "zebra, lib: add an internal API to get relative default nsid in other ns"
This reverts commit 97c9e7533b.
Revert "zebra: map vxlan interface to bridge interface with correct ns id"
This reverts commit 7c990878f2.
Revert "zebra: fdb and neighbor table are read for all zns"
This reverts commit f8ed2c5420.
Revert "zebra: zvni_map_to_svi() adaptation for other network namespaces"
This reverts commit 2a9dccb647.
Revert "zebra: display interface slave type"
This reverts commit fc3141393a.
Revert "zebra: zvni_from_svi() adaptation for other network namespaces"
This reverts commit 6fe516bd4b.
Revert "zebra: importation of bgp evpn rt5 from vni with other netns"
This reverts commit 28254125d0.
Revert "lib, zebra: update interface name at netlink creation"
This reverts commit 1f7a68a2ff.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Added a macro to validate the v4 mapped v6 address.
Modified bgp receive & send updates for v4 mapped v6 address as
nexthop and installing it as recursive nexthop in RIB.
Minor change in fpm while sending the routes for nexthop as
v4 mapped v6 address.
Signed-off-by: Kaushik <kaushik@niralnetworks.com>
When we have a prefix that has been selected, note that that
particular flag has been set and give that information to the
end user.
eva# show bgp ipv4 uni neighbors 192.168.161.131 prefix-counts
Prefix counts for 192.168.161.131, IPv4 Unicast
PfxCt: 814246
Counts from RIB table walk:
Adj-in: 0
Damped: 0
Removed: 0
History: 0
Stale: 0
Valid: 814246
All RIB: 814246
PfxCt counted: 814246
PfxCt Best Selected: 0
Useable: 814246
eva# show bgp ipv4 uni neighbors 192.168.161.2 prefix-counts
Prefix counts for 192.168.161.2, IPv4 Unicast
PfxCt: 814070
Counts from RIB table walk:
Adj-in: 0
Damped: 0
Removed: 0
History: 0
Stale: 0
Valid: 814070
All RIB: 814070
PfxCt counted: 814070
PfxCt Best Selected: 814070
Useable: 814070
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Coverity has noticed that we are using bgp_evpn after
we have already NULL checked it one time. Add an assert
to make Coverity happy here, if we get to this point
something terrible has happened.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Coverity rightly points out that bgp_table_top might return
NULL and immediately deref'ing it might be a problem.
Add a bit of safety.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I wanted to preserve the old code flow to see what might
be needed in the future in commit:
23ca3269da
Coverity doesn't like dead code. So let's comment it out.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If _force_ is set, then ALL prefixes are counted for maximum instead of
accepted only. This is useful for cases where an inbound filter is applied,
but you want maximum-prefix to act on ALL (including filtered) prefixes.
For instance, we have a configuration like:
neighbor r1 maximum-prefix 10
neighbor r1 prefix-list custom in
!
ip prefix-list custom seq 1 permit 10.0.0.0/24
ip prefix-list custom seq 2 permit 10.0.1.0/24
This will accept only 2 prefixes and discard all others instead of
shutting down the session when 10 is reached.
With this new knob (force), we will count all received prefixes and shutdown
the session when 10 is reached.
The bigger problem is when you have lots of peers with full feed and such a
configuration like in an example.
This is kinda re-ordering of how to treat filter vs. maximum-prefix.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
While checking my BGP debugging settings at the console, I noticed
this message was missing a newline. Add it to be consistent with the
other similar messages.
Signed-off-by: Russell Bryant <rbryant@redhat.com>
During perf testing of receiving and installing 7.5 million
routes into zebra it was noticed that memset in bgp_zebra_announce
was taking ~11% of the runtime. With this change bgp_zebra_announce
now no longer has any appreciable time spent in memset as reported
by perf. In addition bgp_zebra_announce run time in perf was
reduced by a composite amount.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
```
(gdb) bt
0 0x00007f45a6f0a781 in raise () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007f45a6ef455b in abort () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007f45a7781920 in core_handler (signo=11, siginfo=0x7fffac7b84b0, context=<optimized out>) at lib/sigevent.c:228
3 <signal handler called>
4 0x000055a4133c0f32 in bgp_table_stats (vty=vty@entry=0x55a415acb240, bgp=0x0, afi=AFI_IP, safi=SAFI_UNICAST, json_array=json_array@entry=0x0) at bgpd/bgp_route.c:11412
5 0x000055a4133c13fb in show_ip_bgp_afi_safi_statistics (self=<optimized out>, vty=0x55a415acb240, argc=6, argv=<optimized out>) at bgpd/bgp_route.c:10749
6 0x00007f45a773917d in cmd_execute_command_real (vline=vline@entry=0x55a415ab7e10, vty=vty@entry=0x55a415acb240, cmd=cmd@entry=0x0, filter=FILTER_RELAXED)
at lib/command.c:909
7 0x00007f45a773afdf in cmd_execute_command (vline=vline@entry=0x55a415ab7e10, vty=vty@entry=0x55a415acb240, cmd=0x0, vtysh=vtysh@entry=0) at lib/command.c:968
8 0x00007f45a773b135 in cmd_execute (vty=vty@entry=0x55a415acb240, cmd=cmd@entry=0x55a415ace950 "show ip bgp vrf all statistics", matched=matched@entry=0x0,
vtysh=vtysh@entry=0) at lib/command.c:1122
9 0x00007f45a7794d62 in vty_command (vty=vty@entry=0x55a415acb240, buf=0x55a415ace950 "show ip bgp vrf all statistics") at lib/vty.c:526
10 0x00007f45a7794fb6 in vty_execute (vty=vty@entry=0x55a415acb240) at lib/vty.c:1293
11 0x00007f45a7797804 in vtysh_read (thread=<optimized out>) at lib/vty.c:2126
12 0x00007f45a778f641 in thread_call (thread=thread@entry=0x7fffac7bb040) at lib/thread.c:1550
13 0x00007f45a775b6d8 in frr_run (master=0x55a415542820) at lib/libfrr.c:1098
14 0x000055a4133815d6 in main (argc=10, argv=0x7fffac7bb2a8) at bgpd/bgp_main.c:509
```
"show ip bgp vrf all statistics" should show statistics for all VRFs if "all"
is specified.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
1) When a session comes up for a peer and if the peer has not adverised
the GR capabilities, BGP sends a request to Zebra to clear any
stale routes that might exist from that peer.
2) When OPEN message is received from the peer, clear the previously
advertised GR capability by the peer, if the lastest received
OPEN message does not contain the GR capability.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
It's hard to cope with cases when next-hop is changed/unchanged or
peers are non-direct.
It would be better to show the hostname and nexthop IP address (both)
under `show bgp` to quickly identify the source and the real next-hop
of the route.
If `bgp default show-nexthop-hostname` is toggled the output looks like:
```
spine1-debian-9# show bgp
BGP table version is 1, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* 2a02:4780::/64 fe80::a00:27ff:fe09:f8a3(exit1-debian-9)
0 0 65001 ?
spine1-debian-9# show ip bgp
BGP table version is 5, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.255.255.0/24 192.168.0.1(exit1-debian-9)
0 0 65001 ?
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
BFD profiles can now be used on the interface level like this:
interface eth1
ip router isis 1
isis bfd
isis bfd profile default
Here the 'default' profile needs to be specified as usual in the
bfdd configuration.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>
```
exit1-debian-9# show bgp summary
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
192.168.0.2 4 200 10 6 0 0 0 00:00:35 8 8
2a02:4780::2 4 0 0 1 0 0 0 never Active 0
Total number of neighbors 2
exit1-debian-9# show bgp summary established
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
192.168.0.2 4 200 10 6 0 0 0 00:00:39 8 8
Total number of neighbors 2
exit1-debian-9# show bgp summary failed
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
2a02:4780::2 0 0 never Waiting for peer OPEN
Total number of neighbors 2
exit1-debian-9#
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
If the RT changes on a L3VPN route then any leak of this route into
a VRF should be withdrawn.
Extend existing EVPN check for RT change to cover L3VPN routes.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
the validation of rpki routes will impact the matching bgp instance.
Until now, the rpki was triggering validation of all bgp entries.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki config can be displayed in the 'show running-config'.
there is a fix to be done yet, this is related to the order of rpki per
vrf configuration. actually, the output is not saveable in the
running-config since the rpki commands are swapped. this prevents from
running rpki config at startup.
That commit also changes the identation, since rpki configure node was
with one extra space. reducing this, and add the changes for vrf
configuration too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki vrf subnode is instantiated under the vrf subnode.
It it to be noted that this commit contains a change in vtysh.
Actually, the output of bgp daemon from show running-config is extracted
in vtysh, and reengineered ( hence the vtysh_config.c change done). This
permits having a subnode under vrf sub node.
Also, add vrf node support to bgpd, as rpki command can not be found
under vrf node.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
it is possible to dump rpki commands per vrf context.
also, rpki start/stop commands are also appended with vrfname parameter.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this commit change introduces a callback function pointer that rtrlib
calls. this permits to create the socket and initialising the socket
with the right information, in the right vrf. Adding to this, rpki uses
a hook to be triggered when a vrf is enabled/disabled. in this way,
start mechanisms will be triggered only when vrf is available, and stop
mechanism will be done upon vrf disable event.
Adding to this, the cache structure contains a back pointer to the rpki
vrf structure. this is done to retrieve the vrf where the cache points
to.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki context can be removed by doing 'no rpki' command from configure
node. this work allows to allocate the associated rpki_vrf context when
entering in rpki node, instead of at the initialisation step.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this work is a preparatory work so that rpki can have per-vrf contexts.
the work consists in allocating a rpki_vrf structure with all inside:
rtr_config, cache, etc..
This work is also necessary in the long term support with yang
northboundapi. Indeed, there may be highly possible that yang context
for rpki be defined per core instance.
That work also instantiates a list of rpki_vrf, though only one instance
is created.
That work also introduces a vrfname field attribute that is set to null
for now , and stands for default vrf where rpki is configured on.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
rpki debugging is linked with standard bgp debugging facilities.
- debug rpki is dumped in running-config if the command is executed from
configure terminal.
- show debugging indicated whether rpki debug is enabled or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when a plugin is attached, some debugs may be attached to that plugin.
For that, add one hook that is interacting with vty: a boolean indicates
what the usage is for: either for impacting the 'show running-config',
or for impacting the 'show debugging' command.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the show running-config rpki was displaying systematically the default
values, when at least one cache server was configured. now, if the rpki
configuration has been changed, either because of a new cache server, or
because of a change in the default settings, then the associated
configuration is dumped in the 'show running-config' command.
adding to this, to permit user to dump the settings values, the command
'show rpki configuration' dumps the values whatever default or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
if ssh cache servers are configured, then show rpki-table is looking at
the tcp server context. Fix this by checking the server cache type, and
also display the ssh context if this is configured.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
currently, private and public key files must differ with the suffix
keywork : '.pub'. If it is not the case, the pub key is ignored.
Inform user for that.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
We have a bunch of code in bgp_vty.c that was passing
to peer_af_flag_modify_vty more than 1 flag at a time.
This was causing the underlying routines to get the
flags wrong. In order to prevent this convert all the
places where we send multiple flags down to this function
to individual flag changes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
SIGHUP is ostensibly supposed to reload configuration
from a fresh slate. This is currently horribly broken
so much so that bgp just crashes. I see no point
in trying to make this work considering the yang
work coming down the pike.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Issue: bgp_process_writes will be called when the fd is writable.
And it will bgp_generate_updgrp_packets to generate the
update packets no matter MRAI is set or not.
Fix: bgp_generate_updgrp_packets thread will return without sending
any update when MRAI timer is still running.
Signed-off-by: Richard Wu <wutong23@baidu.com>
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
We use ASN:VNI format to calculate auto RT for L3VNI.
When L3VNI is not configured, if we delete the configured RT, incorrect auto-RT
value is generated as VRF VNI is 0.
Fix:
Do not configure auto-RT if L3VNI is not configured.
Trigger:
1. Delete L3VNI
2. Delete configured RT.
Before fix:
dev# sh bgp vrf vrf-blue vni
BGP VRF: vrf-blue
Local-Ip: 10.100.0.1
L3-VNI: 0
Rmac: 00:00:00:00:00:00
VNI Filter: none
L2-VNI List:
Export-RTs:
RT:101:0
Import-RTs:
RT:101:0
RD: 10.100.0.1:2
After fix:
dev# sh bgp vrf vrf-blue vni
BGP VRF: vrf-blue
Local-Ip: 10.100.0.1
L3-VNI: 0
Rmac: 00:00:00:00:00:00
VNI Filter: none
L2-VNI List:
Export-RTs:
Import-RTs:
RD: 10.100.0.1:2
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
If we have something like:
```
ip route 1.1.1.0/24 Null0
!
router bgp 100
no bgp ebgp-requires-policy
neighbor 192.168.0.2 remote-as 200
!
address-family ipv4 unicast
network 1.1.1.0/24
redistribute connected
exit-address-family
!
line vty
!
```
1.1.1.0/24 is not advertised due to martian nexthop (0.0.0.0). It starts
working only when we use `redistribute static`.
By checking if it's a BGP static route we able to announce
1.1.1.0/24 with `network 1.1.1.0/24` without redistribute even when
`bgp import-check` is enabled.
Disabling `bgp import-check` works as well, but it's enabled by default
since 7.4.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
RFC states that time should be in seconds since the epoch.
The code was using system uptime in seconds.
Fixes: #6549
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Announcements that are marked as invalid were previously not revalidated.
This was fixed by replacing the range lookup with a subtree lookup.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Currently the I/O pthread handles incoming/outgoing data
communication with all peers. There is no attempt at modifying
the hold timers. It's sole goal is to read/write data to appropriate
channels. All this data is handled as *events* on the master pthread
in BGP. The problem is that if the master pthread is extremely busy
then any packet read that would be treated as a keepalive event may
happen after the hold timer pops, due to the way thread events are handled
in lib/thread.c.
In a last gap attempt, if we notice that we have incoming data
to proceses on the input Queue, slightly delay the hold timer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The tr_*_config structs were previously not pre initialized because
every field is initialized explicitly. But future rtrlib version will
introduce additional fields. Preinitialising the entire struct will
ensure forward compatibility.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Problem reported where bgp sessions were being torn down for ibgp
peers with the reason being optional attribute error. Found that
when a route was leaked, the RTs were stripped but the actual
EXTCOMMUNUNITY attribute was not cleared so an empty ecommunity
attribute stayed in the bgp table and was sent in updates.
Ticket: CM-30000
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
When a peer is bound to a peer-group, the GR flags set on the
peer are over-written.
Update the GR flags for the peer after it has been bound to a
peer-group.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
The code in the bgp extcommunity-list function was using
argv_find to get the correct idx. The problem was that
we had already done argv_finds before and idx was non-zero
thus having us always set the seq pointer to what was last
looked up. This causes us to pass in a value to the
underlying function and it would just wisely ignore it
causing a seq number of 0.
We would then write this seq number of 0 and then immediately
reject it on read in again. BOO!
Actually handle argv_find the way it was meant to be.
Ticket:CM-29926
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When issuing the command `match ip next-hop address`
bgp would crash. This is because the no form of the
command was making the address optional and we would
try to read data we should not be.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
bgp_accept() gets called over and over again when a VRF device is
deleted out from under a bgp listener socket that is bound to it.
Prevent this by noting the error and cancelling ourselves, allowing the
vrf status code to clean up the mess when it receives word about the
change from Zebra.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Try to give a bit more useful data about where we
think the connection is trying to come in from.
Hopefully this will let us debug connection issues
a bit faster in cases where there are config issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When received packet is processed in bgp_process_reads(), the data
is copied to static buffer and then copied to stream buffer.
The data can be copied directly to stream buffer which will avoid extra memcpy
Signed-off-by: kssoman <somanks@gmail.com>
Don't attempt to send BFD daemon a message to remove the peer
registration on daemon exit, otherwise we'll access a dangling
interface pointer and we'll crash.
This crash was not previosly possible because the function that built
the message was passing the interface pointer but not using it due to
the exit condition.
In `lib/bfd.c`:
```
void bfd_peer_sendmsg(struct zclient *zclient, struct bfd_info *bfd_info,
int family, void *dst_ip, void *src_ip, char *if_name,
int ttl, int multihop, int cbit, int command,
int set_flag, vrf_id_t vrf_id)
{
struct bfd_session_arg args = {};
size_t addrlen;
/* Individual reg/dereg messages are suppressed during shutdown. */
if (CHECK_FLAG(bfd_gbl.flags, BFD_GBL_FLAG_IN_SHUTDOWN)) {
if (bfd_debug)
zlog_debug(
"%s: Suppressing BFD peer reg/dereg messages",
__func__);
return;
}
```
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
To remove a BFD profile without removing the BFD configuration just call
`neighbor <A.B.C.D|X:X::X:X|WORD> bfd`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Allow BGP to use the new API to configure BFD session profiles. Now it
is possible to preconfigure BFD sessions without needing to create the
peers.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
"set community accept-own-nexthop" returns "malformed communities"
error. This is because the token matching hits an earlier "accept-own"
and leaves "-nexthop" as a separate token to be processed.
Reorder the switch cases so that both are processed correctly.
Signed-off-by: Appu Joseph <apjo@kaloom.com>
We are crashing in thread_cancel on shutdown because
the thread pointer is NULL. Use the more appropriate
THREAD_CANCEL macro
Ticket: CM-29873
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Extend the next hop tracking for type-2 and type-3 EVPN routes also.
Updates: "bgpd: Add nexthop of received EVPN RT-5 for nexthop tracking"
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When there is a NHT change and the paths dependent on that NHT are being
evaluated, skip those that are marked for removal or as history.
When a route gets withdrawn, its valid flag is cleared and it is flagged
for removal; in the case of an EVPN route, it is also unimported from
VRFs (L2 and/or L3). bgp_process is then scheduled. Under rare timing
conditions, an NHT update for the route's next hop may arrive right after,
and if routes flagged for removal are not skipped, they may not only be
incorrectly marked as valid but also re-imported in the case of EVPN,
which will be a serious error.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ensure that only if there is a change to the path's validity based
on the NHT update, EVPN import or unimport is invoked.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Display next hop resolution information, whether the "detail" option is
specified or not as it is quite fundamental and only minimally increases
the output.
Introduce option to look at a specific NHT entry, which will also show
the paths associated with that entry.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Clean up a few lines of cli command installation; remove a
duplicate; follow the command grouping pattern better.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
There can be cases where evpn traffic is not meshed across various
endpoints, but sent to a central pe. For this situation, remove the
nexthop unchanged default behaviour for bgp evpn. Also add route
reflector commands to bgp evpn node.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Non-best paths (path info structures) also need to be freed during
table cleanup not only to release their memory but to also ensure
any linkages are updated correctly. One such example is for EVPN
where there is a link between the imported path info (in a L2 or
L3 vrf instance) and its parent path info.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
We had already removed the `ip as-path..` command
to have `bgp as-path` but for some reason a `no ip as-path..`
command ALIAS was still around. Kill with extreme prejudice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Without specifying a default afi/safi we get a segfault:
```
(gdb) frame 4
bgp_table_stats (..., afi=32724, safi=SAFI_UNICAST, ...
11349 if (!bgp->rib[afi][safi]) {
(gdb)
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
L3VNI is configured with "prefix-routes-only" flag. Even in this case,
intermittently, we observed that local EVPN MACIP routes are installed and
advertised with 2 labels and 2 export RTs.
This is a sequencing issue. Consider following case where L2VNI 200 and L3VNI
1000 are configured for tenant vrf vrf-blue.
Bug is observed for following sequence of events:
1. vrf-blue BGP instance is created.
2. L2VNI is created in bgp for vni 200. It is linked to the tenant vrf vrf-blue
in function bgpevpn_link_to_l3vni.
Following code sets "VNI_FLAG_USE_TWO_LABELS" flag for vni 200 as L3VNI is not
yet attached to vrf-blue BGP instance.
/* check if we are advertising two labels for this vpn */
if (!CHECK_FLAG(bgp_vrf->vrf_flags, BGP_VRF_L3VNI_PREFIX_ROUTES_ONLY))
SET_FLAG(vpn->flags, VNI_FLAG_USE_TWO_LABELS);
2. Now L3VNI is attached to vrf-blue BGP instance. In this case, we set
BGP_VRF_L3VNI_PREFIX_ROUTES_ONLY flag for vrf-blue but we do not clear
VNI_FLAG_USE_TWO_LABELS flag set on the corresponding L2VNIs.
This fix resolves following 2 issues observed above.
1. When L2VNI is created in BGP, flag VNI_FLAG_USE_TWO_LABELS should not be set
for this VNI if BGP vrf is not attached to any L3VNI.
2. When L3VNI is attached to the BGP vrf, set "VNI_FLAG_USE_TWO_LABELS" flag
if "prefix-routes-only" is not for the vrf.
UT cases:
1. Flap "prefix-routes-only" config for a vrf.
2. Test following triggers for vrfs with and without "prefix-routes-only"
- Flap L2VNI from kernel.
- Flap L3VNI from kernel.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
The `bgp bestpath bandwidth` command should not be a legal
command. Pull out the `no` form to allow this. Allow
`no bgp bestpath bandwidth` to work as we would expect.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is not the attribute involved in path selection and by rfc7606 it should
be just ignored.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Community attributes might have been removed by an inbound route map, so we
should check to ensure they still exist before trying to free them.
This fixes a segfault described in issue #6345.
Signed-off-by: Josh Cox <josh.cox@pureport.com>
The problem is that peer_af_array returns NULL when SAFI is changed to
unicast. We use unicast table, but peer is created and activated under
labeled-unicast, hence we should lookup with a proper SAFI id.
Without this patch peer_af_find() returns NULL and we can't show
PfxSnt in `show bgp summary`.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
It is possible that the if_lookup_by_index() call will return
a NULL value and calling zclient_send_interface_radv_req. Just
test that we have a valid interface pointer.
Found by Coverity
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
unicast and labeled-unicast share the same table, but configuration should
be visible for both independently. Without this fix it confuses a bit
because when you enter `network 10.0.0.0/24` under labeled-unicast it's
written in unicast family block.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Modify the import-check command to require the underlying prefix
to exist in the rib. General consensus is that this is the correct
behavior.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that in many circumstances, RAs created in the
process of bringing up numbered IPv6 peers with extended-nexthop
capability enabled (for ipv4 over ipv6) were not stopped on the
interface when those peers were deleted. Found several circumstances
where this occurred and fix them in this patch.
Ticket: CM-26875
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
These are easy to get subtly wrong, and doing so can cause
nondeterministic failures when racing in parallel builds.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Issue:
Configuring default-originate when static default route is previously
advertised results in withdrawal of the route.
Fix :
Delete the adj-out entry for the previously advertised static
default route without sending explicit withdraw message.
Signed-off-by: kssoman <somanks@gmail.com>
the nlri flowspec above 240 bytes size was not handled.
Over 240 bytes, the length is 2 bytes length, and a calculation must be
done to obtain the real length. This commit handles it appropriately.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
- Fix 1 byte overflow when showing GR info in bgpd
- Use PATH_MAX for path buffers
- Use unsigned specifiers for uint16_t's in zebra pbr
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>