1. MAC-IP routes in the VPN routing table are linked to the
destination ES for efficient handling for remote ES link flaps.
2. Only MAC-IP paths whose nexthops are active (added via EAD-ES)
are imported into the VRF routing table.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* Process FIB update in bgp_zebra_route_notify_owner() and call
group_announce_route() if route is installed
* When bgp update is received for a route which is not installed earlier
(flag BGP_NODE_FIB_INSTALLED is not set) and suppress fib is enabled
set the flag BGP_NODE_FIB_INSTALL_PENDING to indicate fib install is
pending for the route. The route will be advertised when zebra send
ZAPI_ROUTE_INSTALLED status.
* The advertisement delay (BGP_DEFAULT_UPDATE_ADVERTISEMENT_TIME)
is added to allow more routes to be sent in single update message.
This is required since zebra sends route notify message for each route.
The delay will be applied to update group timer which advertises
routes to peers.
Signed-off-by: kssoman <somanks@gmail.com>
The problem is that only prefixes were handled and any other `match`
commands were ignored. Let's do not forget them as well.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Implemented as per the feature description given in the source link.
Descriprion:
The BGP conditional advertisement feature uses the non-exist-map or exist-map
and the advertise-map keywords of the neighbor advertise-map command in order
to track routes by the route prefix.
non-exist-map :
If a route prefix is not present in output of the non-exist-map command, then
the route specified by the advertise-map command is announced.
exist-map :
If a route prefix is present in output of the exist-map command, then the route
specified by the advertise-map command is announced.
The conditional BGP announcements are sent in addition to the normal
announcements that a BGP router sends to its peers.
The conditional advertisement process is triggered by the BGP scanner process,
which runs every 60 seconds. This means that the maximum time for the conditional
advertisement to take effect is 60 seconds. The conditional advertisement can take
effect sooner, depending on when the tracked route is removed from the BGP table
and when the next instance of the BGP scanner occurs.
Sample Configuration on DUT
---------------------------
Router2# show running-config
Building configuration...
Current configuration:
!
frr version 7.6-dev-MyOwnFRRVersion
frr defaults traditional
hostname router
log file /var/log/frr/bgpd.log
log syslog informational
hostname Router2
service integrated-vtysh-config
!
debug bgp updates in
debug bgp updates out
!
debug route-map
!
ip route 200.200.0.0/16 blackhole
ipv6 route 2001:db8::200/128 blackhole
!
interface enp0s9
ip address 10.10.10.2/24
!
interface enp0s10
ip address 10.10.20.2/24
!
interface lo
ip address 2.2.2.2/24
ipv6 address 2001:db8::2/128
!
router bgp 2
bgp log-neighbor-changes
no bgp ebgp-requires-policy
neighbor 10.10.10.1 remote-as 1
neighbor 10.10.20.3 remote-as 3
!
address-family ipv4 unicast
network 2.2.2.0/24
network 200.200.0.0/16
neighbor 10.10.10.1 soft-reconfiguration inbound
neighbor 10.10.10.1 advertise-map ADVERTISE non-exist-map CONDITION
neighbor 10.10.20.3 soft-reconfiguration inbound
exit-address-family
!
address-family ipv6 unicast
network 2001:db8::2/128
network 2001:db8::200/128
neighbor 10.10.10.1 activate
neighbor 10.10.10.1 soft-reconfiguration inbound
neighbor 10.10.10.1 advertise-map ADVERTISE_6 non-exist-map CONDITION_6
neighbor 10.10.20.3 activate
neighbor 10.10.20.3 soft-reconfiguration inbound
exit-address-family
!
access-list CONDITION seq 5 permit 3.3.3.0/24
access-list ADVERTISE seq 5 permit 2.2.2.0/24
access-list ADVERTISE seq 6 permit 200.200.0.0/16
access-list ADVERTISE seq 7 permit 20.20.0.0/16
!
ipv6 access-list ADVERTISE_6 seq 5 permit 2001:db8::2/128
ipv6 access-list CONDITION_6 seq 5 permit 2001:db8::3/128
!
route-map ADVERTISE permit 10
match ip address ADVERTISE
!
route-map CONDITION permit 10
match ip address CONDITION
!
route-map ADVERTISE_6 permit 10
match ipv6 address ADVERTISE_6
!
route-map CONDITION_6 permit 10
match ipv6 address CONDITION_6
!
line vty
!
end
Router2#
Withdraw when non-exist-map prefixes present in BGP table:
----------------------------------------------------------
Router2# show ip bgp all wide
For address family: IPv4 Unicast
BGP table version is 8, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 10.10.10.1 0 0 1 i
*> 2.2.2.0/24 0.0.0.0 0 32768 i
*> 3.3.3.0/24 10.10.20.3 0 0 3 i
*> 200.200.0.0/16 0.0.0.0 0 32768 i
Displayed 4 routes and 4 total paths
For address family: IPv6 Unicast
BGP table version is 8, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2001:db8::1/128 fe80::a00:27ff:fecb:ad57 0 0 1 i
*> 2001:db8::2/128 :: 0 32768 i
*> 2001:db8::3/128 fe80::a00:27ff:fe76:6738 0 0 3 i
*> 2001:db8::200/128 :: 0 32768 i
Displayed 4 routes and 4 total paths
Router2#
Router2# show ip bgp neighbors 10.10.10.1
BGP neighbor is 10.10.10.1, remote AS 1, local AS 2, external link
!--- Output suppressed.
For address family: IPv4 Unicast
Update group 9, subgroup 5
Packet Queue length 0
Inbound soft reconfiguration allowed
Community attribute sent to this neighbor(all)
Condition NON_EXIST, Condition-map *CONDITION, Advertise-map *ADVERTISE, status: Withdraw
1 accepted prefixes
For address family: IPv6 Unicast
Update group 10, subgroup 6
Packet Queue length 0
Inbound soft reconfiguration allowed
Community attribute sent to this neighbor(all)
Condition NON_EXIST, Condition-map *CONDITION_6, Advertise-map *ADVERTISE_6, status: Withdraw
1 accepted prefixes
!--- Output suppressed.
Router2#
Here 2.2.2.0/24 & 200.200.0.0/16 (prefixes in advertise-map) are withdrawn
by conditional advertisement scanner as the prefix(3.3.3.0/24) specified
by non-exist-map is present in BGP table.
Router2# show ip bgp all neighbors 10.10.10.1 advertised-routes wide
For address family: IPv4 Unicast
BGP table version is 8, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 0.0.0.0 0 1 i
*> 3.3.3.0/24 0.0.0.0 0 3 i
Total number of prefixes 2
For address family: IPv6 Unicast
BGP table version is 8, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2001:db8::1/128 :: 0 1 i
*> 2001:db8::3/128 :: 0 3 i
*> 2001:db8::200/128 :: 0 32768 i
Total number of prefixes 3
Router2#
Advertise when non-exist-map prefixes not present in BGP table:
---------------------------------------------------------------
After Removing 3.3.3.0/24 (prefix present in non-exist-map),
2.2.2.0/24 & 200.200.0.0/16 (prefixes present in advertise-map) are advertised
Router2# show ip bgp all wide
For address family: IPv4 Unicast
BGP table version is 9, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 10.10.10.1 0 0 1 i
*> 2.2.2.0/24 0.0.0.0 0 32768 i
*> 200.200.0.0/16 0.0.0.0 0 32768 i
Displayed 3 routes and 3 total paths
For address family: IPv6 Unicast
BGP table version is 9, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2001:db8::1/128 fe80::a00:27ff:fecb:ad57 0 0 1 i
*> 2001:db8::2/128 :: 0 32768 i
*> 2001:db8::200/128 :: 0 32768 i
Displayed 3 routes and 3 total paths
Router2#
Router2# show ip bgp neighbors 10.10.10.1
!--- Output suppressed.
For address family: IPv4 Unicast
Update group 9, subgroup 5
Packet Queue length 0
Inbound soft reconfiguration allowed
Community attribute sent to this neighbor(all)
Condition NON_EXIST, Condition-map *CONDITION, Advertise-map *ADVERTISE, status: Advertise
1 accepted prefixes
For address family: IPv6 Unicast
Update group 10, subgroup 6
Packet Queue length 0
Inbound soft reconfiguration allowed
Community attribute sent to this neighbor(all)
Condition NON_EXIST, Condition-map *CONDITION_6, Advertise-map *ADVERTISE_6, status: Advertise
1 accepted prefixes
!--- Output suppressed.
Router2#
Router2# show ip bgp all neighbors 10.10.10.1 advertised-routes wide
For address family: IPv4 Unicast
BGP table version is 9, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 0.0.0.0 0 1 i
*> 2.2.2.0/24 0.0.0.0 0 32768 i
*> 200.200.0.0/16 0.0.0.0 0 32768 i
Total number of prefixes 3
For address family: IPv6 Unicast
BGP table version is 9, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2001:db8::1/128 :: 0 1 i
*> 2001:db8::2/128 :: 0 32768 i
*> 2001:db8::200/128 :: 0 32768 i
Total number of prefixes 3
Router2#
Signed-off-by: Madhuri Kuruganti <k.madhuri@samsung.com>
Convert IPv4 and IPv6 unicast address family clis
to transactional clis and implementation of
northbound callbacks.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Instead of just counting the route suppressions, keep a reference for
all aggregations that are doing it. It should help the with the
following problems:
- Which aggregation suppressed the route.
- Double suppression
- Double unsuppression
- Avoids calling `bgp_process` if already suppressed/unsuppressed.
- Easier code maintenance and understanding
This also fixes a crash when modifying a route map that is
associated with a working aggregate-address.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add new aggregate-address option to selectively suppress routes based
on route map results.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
We currently have a global process queue for handling route
updates in bgp. This is fine, in general, except there are
places and times where we plug the queue for no new work
during certain peer states of bgp update delay. If we
happen to be processing multiple bgp instances on startup
why do we want to stop processing in vrf A when vrf B
is in a bit of a pickle?
Also this separation will allow us to start forward thinking
about how to fully integrate pthreads into route processing
in bgp.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add code to handle MED matching:
- When MED matches act as normal.
- When MED doesn't match do the following:
* Uninstall the aggregate route
* Unsuppress routes (if using summary-only)
- When MED didn't match, but now matches:
* Install the aggregate route
* Suppress all routes (if using summary-only)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When a SYNC route i.e. a route with a local ES as destination is
rxed on a switch (say L11) from an ES peer (say L12) a local
MAC/neigh entry is created on L11 with the local access port
as dest port.
Creation of the local entry triggers a local path advertisement from
L11. This could be a "locally-active" path or a "locally-inactive"
path. Inactive paths are advertised with the proxy bit.
To ensure that the local entry is not deleted by a SYNC route it is
given absolute precedence over peer-paths.
If there are two non-local paths with the same dest ES and same MM
seq number the non-proxy path is preferred. This is done to ensure
that we don't lose track of the peer-activity.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is the base patch that brings in support for Type-1 routes.
It includes support for -
- Ethernet Segment (ES) management
- EAD route handling
- MAC-IP (Type-2) routes with a non-zero ESI i.e. Aliasing for
active-active multihoming
- Initial infra for consistency checking. Consistency checking
is a fundamental feature for active-active solutions like MLAG.
We will try to levarage the info in the EAD-ES/EAD-EVI routes to
detect inconsitencies in access config across VTEPs attached to
the same Ethernet Segment.
Functionality Overview -
========================
1. Ethernet segments are created in zebra and associated with
access VLANs. zebra sends that info as ES and ES-EVI objects to BGP.
2. BGP advertises EAD-ES and EAD-EVI routes for the locally attached
ethernet segments.
3. Similarly BGP processes EAD-ES and EAD-EVI routes from peers
and translates them into ES-VTEP objects which are then sent to zebra
as remote ESs.
4. Each ES in zebra is associated with a list of active VTEPs which
is then translated into a L2-NHG (nexthop group). This is the ES
"Alias" entry
5. MAC-IP routes with a non-zero ESI use the alias entry created in
(4.) to forward traffic i.e. a MAC-ECMP is done to these remote-ES
destinations.
EAD route management (route table and key) -
============================================
1. Local EAD-ES routes
a. route-table: per-ES route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
b. route-table: per-VNI route-table
Not added
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
2. Remote EAD-ES routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
3. Local EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
4. Remote EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
Please refer to bgp_evpn_mh.h for info on how the data-structures are
organized.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Add ESI as an inline attribute field along with the other EVPN
attributes. This may be re-worked when the rest of the EVPN
attributes find a new home.
Some cleanup has been done to get rid of stale/unused references
to ESI. And also to consolidate duplicate definitions of ES ID
types.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Problem Description:
=====================
+--+ +--+
|R1|-(192.201.202.1)----iBGP----(192.201.202.2)-|R2|
+--+ +--+
Routes on R2:
=============
S>* 202.202.202.202/32 [1/0] via 192.201.78.1, ens256, 00:40:48
Where, the next-hop network, 192.201.78.0/24, is a directly connected network address.
C>* 192.201.78.0/24 is directly connected, ens256, 00:40:48
Configurations on R1:
=====================
!
router bgp 201
bgp router-id 192.168.0.1
neighbor 192.201.202.2 remote-as 201
!
Configurations on R2:
=====================
!
ip route 202.202.202.202/32 192.201.78.1
!
router bgp 201
bgp router-id 192.168.0.2
neighbor 192.201.202.1 remote-as 201
!
address-family ipv4 unicast
redistribute static
exit-address-family
!
Step-1:
=======
R1 receives the route 202.202.202.202/32 from R2.
R1 installs the route in its BGP RIB.
Step-2:
=======
On R1, a connected interface address is added.
The address is the same as the next-hop of the BGP route received from R2 (192.201.78.1).
Point of Failure:
=================
R1 resolves the BGP route even though the route's next-hop is its own connected address.
Even though this appears to be a misconfiguration it would still be better to safeguard the code against it.
Fix:
====
When BGP receives a connected route from Zebra, it processes the
routes for the next-hop update.
While doing so, BGP must ignore routes whose next-hop address matches
the address of the connected route for which Zebra sent the next-hop update
message.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
This macro is undefined if vnc is disabled, and while it defaults to 0,
this is still wrong and causes issues with -Werror
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Implement the code to handle the other route-map options to generate
the link bandwidth, namely, to use the cumulative bandwidth or to
base this on the number of multipaths. In the latter case, a reference
bandwidth is internally chosen - the implementation uses a value of
1 Gbps.
These additional options mean that the prefix may need to be advertised
if there is a link bandwidth change, which is a new criteria. Define a
new path (change) flag to support this and implement the advertisement.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Modify code to use lookup function agg_node_get_prefix()
as the abstraction layer. When we rework bgp_node to
bgp_dest this will allow us to greatly limit the amount
of work needed to do that.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Future work needs the ability to specify a
const struct prefix value. Iterate into
bgp a bit to get this started.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify more code to use `const struct prefix` throughout
bgp. This is all prep work for adding an accessor function
for bgp_node to get the prefix and reduce all the places that
code needs to be touched when we get that work done.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some were converted to bool, where true/false status is needed.
Converted to void only those, where the return status was only false or true.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Ensure that the EVPN advertise route-map is applied on a copy of the
original path_info and associated attribute, so that if the route-map
has SET clauses, they can operate properly. This closely follows
the model already in use in other route-map application code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Override ORIGIN attribute if defined.
E.g.: Cisco and Juniper set ORIGIN for aggregated address
to IGP which is not what rfc4271 says.
This enables the same behavior, optionally.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
* Selection Deferral Timer for Graceful Restart.
* Added selection deferral timer handling function.
* Route marking as selection defer when update message is received.
* Staggered processing of routes which are pending best selection.
* Fix for multi-path test case.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
bgpd already supports BGP Prefix-SID path attribute and
there are some sub-types of Prefix-SID path attribute.
This commits makes bgpd to support additional sub-types.
sub-Type-4 and sub-Type-5 for construct the VPNv4 SRv6 backend
with vpnv4-unicast address family.
This path attributes is already supported by Ciscos IOS-XR and NX-OS.
Prefix-SID sub-Type-4 and sub-Type-5 is defined on following
IETF-drafts.
Supports(A-part-of):
- https://tools.ietf.org/html/draft-dawra-idr-srv6-vpn-04
- https://tools.ietf.org/html/draft-dawra-idr-srv6-vpn-05
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
By default announct Self Type-2 routes with
system IP as nexthop and system MAC as
nexthop.
An API to check type-2 is self route via
checking ipv4/ipv6 address from connected interfaces list.
An API to extract RMAC and nexthop for type-2
routes based on advertise-svi-ip knob is enabled.
When advertise-pip is enabled/disabled, trigger type-2
route update. For self type-2 routes to use
anycast or individual (rmac, nexthop) addresses.
Ticket:CM-26190
Reviewed By:
Testing Done:
Enable 'advertise-svi-ip' knob in bgp default instance.
the vrf instance svi ip is advertised with nexthop
as default instance router-id and RMAC as system MAC.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This is annoying when editing a file and saving the file. IDEs like
VSCode can automatically remove trailing whitespaces, hence it would be better
having a clean code before pushing other changes.
I step onto this not the first time.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
BMP uses this to get notified about any changes to prefixes, at which
point it schedules its own processing to happen later.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
FRR has a provision to give exact-match in match clause for
standard community, but this option is missing for lcommunity.
Part 3 : show related changes for match clause
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
Store in bgp_node the reason why we choose a particular
best path over another. At this point we do not do
anything other than just store this data when we make
the decision. Future commits will display it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Instead of just passing in the prefix, pass in the particular
bgp_node we are using.
This is setup for a future commit to use this data.
The long term goal is to collect data about why
a particular bgp_path_info was selected as best and
to display that reason.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_path_info_new function whenever it was called
pretty much duplicated the info_make function call. So
convert over to using it and remove the bgp_path_info_new
function so people are not tempted.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
With this commit:
1) 'struct bgp_aggregate' is moved to bgp_route.h from bgp_route.c
2) Hashes to accommodate the as-path, communities, extended-communities and
large-communities attributes of all the routes aggregated by an
aggregate route is introduced in 'struct bgp_aggregate'.
3) Place-holders for the aggregate route's as-path, communities,
extended-communities and large-communities attributes are introduced in
'struct bgp_aggregate'.
4) The code to manage the as-path of the routes that are aggregatable under
a configured aggregate-address is introduced.
5) The code to compute the aggregate-route's as-path is introduced.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
IPv4 or IPv6 unicast routes which are imported from EVPN routes
(type-2 or type-5) and installed in a BGP instance and then leaked
do not need any nexthop tracking, as any tracking should happen in
the source instance.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
A few code paths weren't handling the vpnv6 nexthop lenghts as
expected, which was leading to problems like imported vpnv6 routes
not being marked as valid when they should. Fix this.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
the list of iprules is displayed in the 'show bgp ipv4 flowspec detail'
The list of iprules is displayed, only if it is installed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The motivation for this patch is to address a concerning behavior of
tx-addpath-bestpath-per-AS. Prior to this patch, all paths' TX ID was
pre-determined as the path was received from a peer. However, this meant
that any time the path selected as best from an AS changed, bgpd had no
choice but to withdraw the previous best path, and advertise the new
best-path under a new TX ID. This could cause significant network
disruption, especially for the subset of prefixes coming from only one
AS that were also communicated over a bestpath-per-AS session.
The patch's general approach is best illustrated by
txaddpath_update_ids. After a bestpath run (required for best-per-AS to
know what will and will not be sent as addpaths) ID numbers will be
stripped from paths that no longer need to be sent, and held in a pool.
Then, paths that will be sent as addpaths and do not already have ID
numbers will allocate new ID numbers, pulling first from that pool.
Finally, anything left in the pool will be returned to the allocator.
In order for this to work, ID numbers had to be split by strategy. The
tx-addpath-All strategy would keep every ID number "in use" constantly,
preventing IDs from being transferred to different paths. Rather than
create two variables for ID, this patch create a more generic array that
will easily enable more addpath strategies to be implemented. The
previously described ID manipulations will happen per addpath strategy,
and will only be run for strategies that are enabled on at least one
peer.
Finally, the ID numbers are allocated from an allocator that tracks per
AFI/SAFI/Addpath Strategy which IDs are in use. Though it would be very
improbable, there was the possibility with the free-running counter
approach for rollover to cause two paths on the same prefix to get
assigned the same TX ID. As remote as the possibility is, we prefer to
not leave it to chance.
This ID re-use method is not perfect. In some cases you could still get
withdraw-then-add behaviors where not strictly necessary. In the case of
bestpath-per-AS this requires one AS to advertise a prefix for the first
time, then a second AS withdraws that prefix, all within the space of an
already pending MRAI timer. In those situations a withdraw-then-add is
more forgivable, and fixing it would probably require a much more
significant effort, as IDs would need to be moved to ADVs instead of
paths.
Signed-off-by Mitchell Skiba <mskiba@amazon.com>
Do a straight conversion of `struct bgp_info` to `struct bgp_path_info`.
This commit will setup the rename of variables as well.
This is being done because `struct bgp_info` is not descriptive
of what this data actually is. It is path information for routes
that we keep to build the actual routes nexthops plus some extra
information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that some bgp and ospf json commands did not return
any json output at all if the bgp/ospf instance did not exist.
Additionally, some bgp and ospf json commands did not return any json
output if the instance existed but no neighbors were defined. This
fix makes these commands more consistent in returning empty braces for
json output and issue a message if not using json output. Additionally,
made the flag "use_json" a bool to make it consistent since previously,
it had been defined as an int, char, u_char, and bool at various places.
Ticket: CM-21040
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Because one flowspec entry can create 1-N bgp pbr entries, the list is
now updated and visible. Also, because the bgp_extra structure is used,
this list is flushed when the bgp_extra structure is deleted.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Once PBR rules installed, an information is printed in the main
show bgp ipv4 flowspec detail information.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit changes the behavior of `show bgp [afi] [safi] neighbor
<neighbor> received-routes [json]` to return all received prefixes
instead of filtering rejected/denied prefixes.
Compared to Cisco and Juniper products, this is the usual way how this
command is supposed to work, as `show bgp [afi] [safi] neighbor
<neighbor> routes` will already return all accepted prefixes.
Additionally, the new command `show bgp [afi] [safi] neighbor <neighbor>
filtered-routes` has been added, which returns a list of all prefixes
that got filtered away, so it can be roughly described as a subset of
"received prefixes - accepted prefixes".
As the already available `filtered_count` variable inside
`show_adj_route` has not been used before, the last output line
summarizing the amount of prefixes found was extended to also mention
the amount of filtered prefixes if present.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
As part of recent vpn-vrf leaking changes, it is now possible for a
route to refer to a nexthop in a different vrf. There is also a new
route flag that means "when announcing this route, indicate myself
as the next-hop."
route_vty_out(): nexthops are appended with:
"@VRFID" (where VRFID is the numerical vrf id) when different from
the route's vrf;
"<" when the route's BGP_INFO_ANNC_NH_SELF is set
This change also shows the route table's vrf id in the table header.
route_vty_out_detail(): show nexthop's vrf and announce-nh-self flag if
appropriate.
Both functions are also augmented to add json elements nhVrfId, nhVrfName,
and announceNexthopSelf as appropriate.
The intent of these changes is to make it easier to understand/debug
the relationship between a route and its nexthops.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Routes that have labels must be sent via a nexthop that also has labels.
This change notes whether any path in a nexthop update from zebra contains
labels. If so, then the nexthop is valid for routes that have labels.
If a nexthop update has no labeled paths, then any labeled routes
referencing the nexthop are marked not valid.
Add a route flag BGP_INFO_ANNC_NH_SELF that means "advertise myself
as nexthop when announcing" so that we can track our notion of the
nexthop without revealing it to peers.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
The show bgp ipv4 flowspec routine is made available, displays the
flowspec rules contained in the BGP FIB database, as well as the actions
to be done on those rules. Two routines are available:
show bgp ipv4 flowspec
show bgp ipv4 flowspec detail
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
- add "debug bgp vpn label" CLI
- improved debug messages for "debug bgp bestpath"
- send vrf label to zebra after zebra informs bgpd of vrf_id
- withdraw vrf_label from zebra if zebra informs bgpd that vrf_id is disabled
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
When doing symmetric routing,
EVPN type-2 (MACIP) routes need to be advertised with two labels (VNIs)
the first being the L2 VNI (identifying the VLAN) and
the second being the L3 VNI (identifying the VRF).
The receive processing needs to handle one or two labels too.
Ticket: CM-18489
Review: CCR-6949
Testing: manual and bgp/evpn/mpls smoke
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
CLI config for enabling/disabling type-5 routes
router bgp <as> vrf <vrf>
address-family l2vpn evpn
[no] advertise <ipv4|ipv6|both>
loop through all the routes in VRF instance and advertise/withdraw
all ip routes as type-5 routes in default instance.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When BGP is being redistributed prefixes, allow it to
understand the nexthop type.
This fixes the issue where a blackhole route was being interpreted
to having a nexthop of 1.0.0.0( ruh-roh!!! ). This broke
downstream neighbors as that they would receive a 1.0.0.0 nexthop,
which is bad, very very bad.
This commit sets us up for the future where we can match
a route-map against a nexthop type. In that bgp is
now at least nominally paying attention to the type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
afi_header_vty_out() is easily replaced with vty_frame(), which means we
can drop a whole batch of "int *write" args as well as the entirety of
bgp_config_write_family_header().
=> AFI/SAFI config writing is now a lot simpler.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Some differences compared to the old API:
* Now the redistributed routes are sent using address-family
independent messages (ZEBRA_REDISTRIBUTE_ROUTE_ADD and
ZEBRA_REDISTRIBUTE_ROUTE_DEL). This allows us to unify the ipv4/ipv6
zclient callbacks in the client daemons and thus remove a lot of
duplicate code;
* Now zebra sends all nexthops of the redistributed routes to the client
daemons, not only the first one. This shouldn't have any noticeable
performance implications and will allow us to remove an ugly exception
we had for ldpd (which needs to know all nexthops of the redistributed
routes). The other client daemons can simply ignore the nexthops if
they want or consult just the first one (e.g. ospfd/ospf6d/ripd/ripngd).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
There are two parts to this commit:
1. create a database of self tunnel-ip for used in martian nexthop check
In a CLAG setup, the tunnel-ip (VNI UP) notification comes before the clag-anycast-ip comes up in the system.
This was causing our self next hop check to fail and we were instaling routes with martian nexthop in zebra.
We need to keep this info in a seperate database for all local tunnel-ip.
This database will be used in parallel with the self next hop database to martian nexthop checks.
2. When a local VNI comes up, update the tunnel-ip database and filter routes in the RD table if necessary
In case of EVPN we might receive routes from clag peer before the clag-anycast ip and VNI is up on the system.
We will store the routes in the RD table for later processing.
When VNI comes UP, we loop thorugh all the routes and install them in zebra if required.
However, we were missing the martian nexthop check in this code path.
From now onwards, when a VNI comes UP,
we will first update the tunnel-ip database
We then loop through all the routes in RD table and apply martian next hop filter if required.
Things not covered in this commit but are required:
This processing is needed in general when an address becomes a connected address.
We need to loop through all the routes in BGP and apply martian nexthop filter if necessary.
This will be taken care in a seperate bug
Ticket:CM-17271/CM-16911
Reviewed By: ccr-6542
Testing Done: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
This reverts commit c14777c6bf.
clang 5 is not widely available enough for people to indent with. This
is particularly problematic when rebasing/adjusting branches.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Most of the attributes in 'struct attr_extra' allow for
the more interesting cases of using bgp. The extra
overhead of managing it will induce errors as we add
more attributes and the extra memory overhead is
negligible on anything but full bgp feeds.
Additionally this greatly simplifies the code for
the handling of data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
bgpd: Fix missing label set
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Core EVPN route handling functionality. This includes support for the
following:
- interface with zebra to learn about local VNIs and MACIPs as well as
to install remote VTEPs (per VNI) and remote MACIPs
- create/update/delete EVPN type-2 and type-3 routes
- attribute creation, route selection and install
- route handling per VNI and for the global routing table
- parsing of received EVPN routes and handling by route type
- encoding attributes for EVPN routes and EVPN prefix creation (for
Updates)
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
- All ipv4 labeled-unicast routes are now installed in the ipv4 unicast
table. This allows us to do things like take routes from an ipv4
unicast peer, allocate a label for them and TX them to a ipv4
labeled-unicast peer. We can do the opposite where we take routes from
a labeled-unicast peer, remove the label and advertise them to an ipv4
unicast peer.
- Multipath over a labeled route and non-labeled route is not allowed.
- You cannot activate a peer for both 'ipv4 unicast' and 'ipv4
labeled-unicast'
- The 'tag' variable was overloaded for zebra's route tag feature as
well as the mpls label. I added a 'mpls_label_t mpls' variable to
avoid this. This is much cleaner but resulted in touching a lot of
code.
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Pointer to void always points at the same thing so make it the type of
the thing that it points at
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Implement BGP Prefix-SID IETF draft to be able to signal a labeled-unicast
prefix with a label index (segment ID). This makes it easier to deploy
global MPLS labels with BGP, even without other aspects of Segment Routing
implemented.
This patch implements configuration of the global label block (SRGB) and
configuration of a label-index for a network in BGP.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Implement support for negotiating IPv4 or IPv6 labeled-unicast address
family, exchanging prefixes and installing them in the routing table, as
well as interactions with Zebra for FEC registration. This is the
implementation of RFC 3107.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
With the changed API, some adaptations are done in rfapi code, and in
bgpd evpn code. For evpn code, the internal storage of routermac addr is
kept as struct ethaddr structure. Also the evpn add_routermac api has as
incoming parameter a struct ethaddr param.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The introduction of AFI_L2VPN prefix makes usage of AFI_ETHER deprecated
and is of no usage currently. The former replaces the latter one.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
A new vty command available under evpn address family. This command
takes following format:
(af-evpn)# [no] network <A.B.C.D/M|X:X::X:X/M> rd ASN:nn_or_IP-address:nn ethtag WORD
label WORD esi WORD gwip A.B.C.D routermac WORD
[route-map WORD]
Among new parameters, ethtag stands for the ethernet tag indentifier.
ESI stands for the ethernet segment identifier, and must be entered in
following format: 00:11:22:33:44:55:66:77:88:99.
gwip stands for the gateway IP address contained in RT5 message. A
check is done on that value since if gwip is ipv4, then ip prefix must
be ipv4. The same for ipv6.
RouterMAc is the gateway mac address sent as extended community
attribute.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This patch introduces show show bgp evpn commands to dump
NLRI entries configured or received on BGP, related to EVPN
New command introduced is the following:
show [ip] bgp l2vpn evpn [all | rd <rd name> ] [overlay]
Like for MPLS, similar set of commands is added for EVPN:
show [ip] bgp l2vpn evpn [all|rd <RDNAME>]
show [ip] bgp l2vpn evpn all neighbor <NEIGHBOR> routes
show [ip] bgp l2vpn evpn all neighbor <NEIGHBOR> advertised-routes
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This field can be either IPv4 or IPv6 address and is filled in in
bgp_static configuration structure.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
As per draft-ietf-idr-tunnel-encaps-02, section 3.2.1, BGP Encap
attribute supports vxlan tunnel type. A new tunnel attribute has been
appended to subtlv list, describing the vxlan network identifier to
be used for the routing information of the BGP update message.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The commit introduces the changes to be done to carry route type 5 EVPN
information in bgp extra attribute information. The commit also handles
the update processing for route type 5 information, including ESI,
gatewayIP and label information.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This patch introduces code to receive a NLRI message with route type
5, as defined in draft-ietf-bess-evpn-prefix-advertisement-02. It
It increases the number of parameters to extract from the NLRI and
to store into bgp extra information structure. Those parameters are
the ESI (ethernet segment identifier), the gateway IP Address (which
acts like nexthop attribute but is contained inside the NLRI itself)
and the ethernet tag identifier ( that acts for the VXLan Identifier)
This patch updates bgp_update() and bgp_withdraw() api, and then does the
necessary adapations for rfapi.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This bgp_show_type enumerate was duplicated and modified in several
places. The commit takes the enumerate with the biggest enumerate, so
that it can be used by all the functions using this enumerate.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
* bgpd parses NLRIs twice, a first pass "sanity check" and then a second pass
that changes actual state. For most AFI/SAFIs this is done by
bgp_nlri_sanity_check and bgp_nlri_parse, which are almost identical.
As the required action on a syntactic error in an NLRI is to NOTIFY and
shut down the session, it should be acceptable to just do a one pass
parse. There is no need to atomically handle the NLRIs.
* bgp_route.h: (bgp_nlri_sanity_check) Delete
* bgp_route.c: (bgp_nlri_parse) Make the prefixlen size check more general
and don't hard-code AFI/SAFI details, e.g. use prefix_blen library function.
Add error logs consistent with bgp_nlri_sanity_check as much as possible.
Add a "defense in depth" type check of the prefixlen against the sizeof
the (struct prefix) storage - ala bgp_nlri_parse_vpn.
Update standards text from draft RFC4271 to the actual RFC4271 text.
Extend the semantic consistency test of IPv6. E.g. it should skip mcast
NLRIs for unicast safi as v4 does.
* bgp_mplsvpn.{c,h}: Delete bgp_nlri_sanity_check_vpn and make
bgp_nlri_parse_vpn_body the bgp_nlri_parse_vpn function again.
(bgp_nlri_parse_vpn) Remove the notifies. The sanity checks were
responsible for this, but bgp_update_receive handles sending NOTIFY
generically for bgp_nlri_parse.
* bgp_attr.c: (bgp_mp_reach_parse,bgp_mp_unreach_parse) Delete sanity check.
NLRI parsing done after attr parsing by bgp_update_receive.
Arising out of discussions on the need for two-pass NLRI parse with:
Lou Berger <lberger@labn.net>
Donald Sharp <sharpd@cumulusnetworks.com>
* bgp_packet.c: (bgp_update_receive) Lots of repeated code, doing same
thing for each AFI/SAFI. Except when it doesn't, e.g. the IPv4/VPN
case was missing the EoR bgp_clear_stale_route call - the only action
really needed for EoR.
Make this function a lot more regular, using common, AFI/SAFI
independent blocks so far as possible.
Replace the 4 separate bgp_nlris with an array, indexed by an enum.
The distinct blocks that handle calling bgp_nlri_parse for each
different AFI/SAFI can now be replaced with a loop.
Transmogrify the nlri SAFI from the SAFI_MPLS_LABELED_VPN code-point
used on the wire, to the SAFI_MPLS_VPN safi_t enum we use internally
as early as possible.
The existing code was not necessarily sending a NOTIFY for NLRI
parsing errors, if they arose via bgp_nlri_sanity_check. Send the
correct NOTIFY - INVAL_NETWORK for the classic NLRIs and OPT_ATTR_ERR
for the MP ones.
EoR can now be handled in one block. The existing code seemed broken
for EoR recognition in a number of ways:
1. A v4/unicast EoR should be an empty UPDATE. However, it seemed
to be treating an UPDATE with attributes, inc. MP REACH/UNREACH,
but no classic NLRIs, as a v4/uni EoR.
2. For other AFI/SAFIs, it was treating UPDATEs with no classic
withraw and with a zero-length MP withdraw as EoRs. However, that
would mean an UPDATE packet _with_ update NLRIs and a 0-len MP
withdraw could be classed as an EoR.
This seems to be loose coding leading to ambiguous protocol
situations and likely incorrect behaviour, rather than simply being
liberal. Be more strict about checking that an UPDATE really is an
EoR and definitely is not trying to update any NLRIs.
This same loose EoR parsing was noted by Chris Hall previously on
list.
(bgp_nlri_parse) Front end NLRI parse function, to fan-out to the correct
parser for the AFI/SAFI.
* bgp_route.c: (bgp_nlri_sanity_check) We try convert NLRI safi to
internal code-point ASAP, adjust switch for that. Leave the wire
code point in for defensive coding.
(bgp_nlri_parse) rename to bgp_nlri_parse_ip.
* tests/bgp_mp_attr_test.c: Can just use bgp_nlri_parse frontend.
* bgp_route.h: (bgp_nlri_sanity_check) The bulk of the args are equivalent
to a (struct bgp_nlri), consolidate.
* bgp_route.c: (bgp_nlri_sanity_check) Make this a frontend for all afi/safis.
Including SAFI_MPLS_LABELED_VPN.
(bgp_nlri_sanity_check_ip) Regular IP NLRI sanity check based on the
existing code, and adjusted for (struct bgp_nlri *) arg.
* bgp_attr.c: (bgp_mp_reach_parse) Adjust for passing (struct bgp_nlri *)
to bgp_nlri_sanity_check.
Get rid of special-casing to not sanity check VPN.
(bgp_mp_unreach_parse) Ditto.
* bgp_mplsvpn.c: Use the same VPN parsing code for both the sanity
check and the actual parse.
(bgp_nlri_parse_vpn) renamed to bgp_nlri_parse_vpn_body and made
internal.
(bgp_nlri_parse_vpn_body) Added (bool) argument to control whether it
is sanity checking or whether it should update routing state for each
NLRI. Send a NOTIFY and reset the session, if there's a parsing
error, as bgp_nlri_sanity_check_ip does, and as is required by the
RFC.
(bgp_nlri_parse_vpn) now a wrapper to call _body with update.
(bgp_nlri_sanity_check_vpn) wrapper to call parser without
updating.
* bgp_mplsvpn.h: (bgp_nlri_sanity_check_vpn) export for
bgp_nlri_sanity_check.
* bgp_packet.c: (bgp_update_receive) Adjust for bgp_nlri_sanity_check
argument changes.
* test/bgp_mp_attr_test.c: Extend to also test the NLRI parsing functions,
if the initial MP-attr parsing has succeeded. Fix the NLRI in the
VPN cases. Add further VPN tests.
* tests/bgpd.tests/testbgpmpattr.exp: Add the new test cases.
This commit a joint effort of:
Lou Berger <lberger@labn.net>
Donald Sharp <sharpd@cumulusnetworks.com>
Paul Jakma <paul.jakma@hpe.com> / <paul@jakma.org>
Until today the admin distance cannot be configured for any IPv6
routing protocol. This patch implements it for bgp.
Signed-off-by: Maitane Zotes <maz@open.ch>
Signed-off-by: Roman Hoog Antink <rha@open.ch>
This patch improves zebra,ripd,ripngd,ospfd and bgpd so that they can
make use of 32-bit route tags in the case of zebra,ospf,bgp or 16-bit
route-tags in the case of ripd,ripngd.
It is based on the following patch:
commit d25764028829a3a30cdbabe85f32408a63cccadf
Author: Paul Jakma <paul.jakma@hpe.com>
Date: Fri Jul 1 14:23:45 2016 +0100
*: Widen width of Zserv routing tag field.
But also contains the changes which make this actually useful for all
the daemons.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
This feature adds an L3 & L2 VPN application that makes use of the VPN
and Encap SAFIs. This code is currently used to support IETF NVO3 style
operation. In NVO3 terminology it provides the Network Virtualization
Authority (NVA) and the ability to import/export IP prefixes and MAC
addresses from Network Virtualization Edges (NVEs). The code supports
per-NVE tables.
The NVE-NVA protocol used to communicate routing and Ethernet / Layer 2
(L2) forwarding information between NVAs and NVEs is referred to as the
Remote Forwarder Protocol (RFP). OpenFlow is an example RFP. For
general background on NVO3 and RFP concepts see [1]. For information on
Openflow see [2].
RFPs are integrated with BGP via the RF API contained in the new "rfapi"
BGP sub-directory. Currently, only a simple example RFP is included in
Quagga. Developers may use this example as a starting point to integrate
Quagga with an RFP of their choosing, e.g., OpenFlow. The RFAPI code
also supports the ability import/export of routing information between
VNC and customer edge routers (CEs) operating within a virtual
network. Import/export may take place between BGP views or to the
default zebera VRF.
BGP, with IP VPNs and Tunnel Encapsulation, is used to distribute VPN
information between NVAs. BGP based IP VPN support is defined in
RFC4364, BGP/MPLS IP Virtual Private Networks (VPNs), and RFC4659,
BGP-MPLS IP Virtual Private Network (VPN) Extension for IPv6 VPN . Use
of both the Encapsulation Subsequent Address Family Identifier (SAFI)
and the Tunnel Encapsulation Attribute, RFC5512, The BGP Encapsulation
Subsequent Address Family Identifier (SAFI) and the BGP Tunnel
Encapsulation Attribute, are supported. MAC address distribution does
not follow any standard BGB encoding, although it was inspired by the
early IETF EVPN concepts.
The feature is conditionally compiled and disabled by default.
Use the --enable-bgp-vnc configure option to enable.
The majority of this code was authored by G. Paul Ziemba
<paulz@labn.net>.
[1] http://tools.ietf.org/html/draft-ietf-nvo3-nve-nva-cp-req
[2] https://www.opennetworking.org/sdn-resources/technical-library
Now includes changes needed to merge with cmaster-next.
This changes the existing _vpnv4 functions for MPLS-VPN into
SAFI-agnostic functions, renaming them from *_vpnv4 to *_safi.
Also adds route-map support while at it.
Signed-off-by: Lou Berger <lberger@labn.net>
Reviewed-by: David Lamparter <equinox@opensourcerouting.org>
(cherry picked from commit a76d9ca3584c1751a592457c167c1e146648ceb6)
Conflicts:
bgpd/bgp_route.c
Various changes and fixes related to VRF registration, deletion,
BGP exit etc.
- Define instance type
- Ensure proper handling upon instance create, delete and
VRF add/delete from zebra
- Cleanup upon bgp_exit()
- Ensure messages are not sent to zebra for unknown VRFs
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-9128, CM-7203
Reviewed By: CCR-4098
Testing Done: Manual
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-8122
per draft-ietf-idr-ix-bgp-route-server-09:
2.3.2.2.2. BGP ADD-PATH Approach
The [I-D.ietf-idr-add-paths] Internet draft proposes a different
approach to multiple path propagation, by allowing a BGP speaker to
forward multiple paths for the same prefix on a single BGP session.
As [RFC4271] specifies that a BGP listener must implement an implicit
withdraw when it receives an UPDATE message for a prefix which
already exists in its Adj-RIB-In, this approach requires explicit
support for the feature both on the route server and on its clients.
If the ADD-PATH capability is negotiated bidirectionally between the
route server and a route server client, and the route server client
propagates multiple paths for the same prefix to the route server,
then this could potentially cause the propagation of inactive,
invalid or suboptimal paths to the route server, thereby causing loss
of reachability to other route server clients. For this reason, ADD-
PATH implementations on a route server should enforce send-only mode
with the route server clients, which would result in negotiating
receive-only mode from the client to the route server.
This allows us to delete all of the following code:
- All XXXX_rsclient() functions
- peer->rib
- BGP_TABLE_MAIN and BGP_TABLE_RSCLIENT
- RMAP_IMPORT and RMAP_EXPORT
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com
Ticket: CM-8014
This implements addpath TX with the first feature to use it
being "neighbor x.x.x.x addpath-tx-all-paths".
One change to show output is 'show ip bgp x.x.x.x'. If no addpath-tx
features are configured for any peers then everything looks the same
as it is today in that "Advertised to" is at the top and refers to
which peers the bestpath was advertise to.
root@superm-redxp-05[quagga-stash5]# vtysh -c 'show ip bgp 1.1.1.1'
BGP routing table entry for 1.1.1.1/32
Paths: (6 available, best #6, table Default-IP-Routing-Table)
Advertised to non peer-group peers:
r1(10.0.0.1) r2(10.0.0.2) r3(10.0.0.3) r4(10.0.0.4) r5(10.0.0.5) r6(10.0.0.6) r8(10.0.0.8)
Local, (Received from a RR-client)
12.12.12.12 (metric 20) from r2(10.0.0.2) (10.0.0.2)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 8
Last update: Fri Oct 30 18:26:44 2015
[snip]
but once you enable an addpath feature we must display "Advertised to" on a path-by-path basis:
superm-redxp-05# show ip bgp 1.1.1.1/32
BGP routing table entry for 1.1.1.1/32
Paths: (6 available, best #6, table Default-IP-Routing-Table)
Local, (Received from a RR-client)
12.12.12.12 (metric 20) from r2(10.0.0.2) (10.0.0.2)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 8
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:44 2015
Local, (Received from a RR-client)
34.34.34.34 (metric 20) from r3(10.0.0.3) (10.0.0.3)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 7
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:39 2015
Local, (Received from a RR-client)
56.56.56.56 (metric 20) from r6(10.0.0.6) (10.0.0.6)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 6
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:39 2015
Local, (Received from a RR-client)
56.56.56.56 (metric 20) from r5(10.0.0.5) (10.0.0.5)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 5
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:39 2015
Local, (Received from a RR-client)
34.34.34.34 (metric 20) from r4(10.0.0.4) (10.0.0.4)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 4
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:39 2015
Local, (Received from a RR-client)
12.12.12.12 (metric 20) from r1(10.0.0.1) (10.0.0.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
AddPath ID: RX 0, TX 3
Advertised to: r1(10.0.0.1) r2(10.0.0.2) r3(10.0.0.3) r4(10.0.0.4) r5(10.0.0.5) r6(10.0.0.6) r8(10.0.0.8)
Last update: Fri Oct 30 18:26:34 2015
superm-redxp-05#
Ticket: CM-6789
Reviewed By: CCR-3263
Testing Done: Manual Testing and smoke tests
Whenever some sort of output is encountered, added a json version with
proper logic as well.
This adds support for BGP RFC 5549 (Extended Next Hop Encoding capability)
* send and receive of the capability
* processing of IPv4->IPv6 next-hops
* for resolving these IPv6 next-hops, itsworks with the current
next-hop-tracking support
* added a new message type between BGP and Zebra for such route
install/uninstall
* zserv side of changes to process IPv4 prefix ->IPv6 next-hops
* required show command changes for IPv4 prefix having IPv6 next-hops
Few points to note about the implementation:
* It does an implicit next-hop-self when a [IPv4 prefix -> IPv6 LL next-hop]
is to be considered for advertisement to IPv4 peering (or IPv6 peering
without Extended next-hop capability negotiated)
* Currently feature is off by default, enable it by configuring
'neighbor <> capability extended-nexthop'
* Current support is for IPv4 Unicast prefixes only.
IMPORTANT NOTE:
This patch alone isn't enough to have IPv4->IPv6 routes installed into
the kernel. A separate patch is needed for that to work for the netlink
interface.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Vivek Venkatraman <vivek@cumulusnetworks.com>
Donald Sharp <sharpd@cumulusnetworks.com>
BGP: Fix network import check use with NHT instead of scanner
When next hop tracking was implemented and the bgp scanner was eliminated,
the "network import-check" command got broken. This patch fixes that
issue. NHT is used to not just track nexthops, but also the static routes
that are announced as part of BGP's network command. The routes are
registered only when import-check is enabled. To optimize performance,
we register static routes only when import-check is enabled.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
This patch implements the 'update-groups' functionality in BGP. This is a
function that can significantly improve BGP performance for Update generation
and resultant network convergence. BGP Updates are formed for "groups" of
peers and then replicated and sent out to each peer rather than being formed
for each peer. Thus major BGP operations related to outbound policy
application, adj-out maintenance and actual Update packet formation
are optimized.
BGP update-groups dynamically groups peers together based on configuration
as well as run-time criteria. Thus, it is more flexible than update-formation
based on peer-groups, which relies on operator configuration.
[Note that peer-group based update formation has been introduced into BGP by
Cumulus but is currently intended only for specific releases.]
From 11098af65b2b8f9535484703e7f40330a71cbae4 Mon Sep 17 00:00:00 2001
Subject: [PATCH] updgrp commits
——————————————-------------
- etc/init.d/quagga is modified to support creating separate ospf daemon
process for each instance. Each individual instance is monitored by
watchquagga just like any protocol daemons.(requires initd-mi.patch).
- Vtysh is modified to able to connect to multiple daemons of the same
protocol (supported for OSPF only for now).
- ospfd is modified to remember the Instance-ID that its invoked with. For
the entire life of the process it caters to any command request that
matches that instance-ID (unless its a non instance specific command).
Routes/messages to zebra are tagged with instance-ID.
- zebra route/redistribute mechanisms are modified to work with
[protocol type + instance-id]
- bgpd now has ability to have multiple instance specific redistribution
for a protocol (OSPF only supported/tested for now).
- zlog ability to display instance-id besides the protocol/daemon name.
- Changes in other daemons are to because of the needed integration with
some of the modified APIs/routines. (Didn’t prefer replicating too many
separate instance specific APIs.)
- config/show/debug commands are modified to take instance-id argument
as appropriate.
Guidelines to start using multi-instance ospf
---------------------------------------------
The patch is backward compatible, i.e for any previous way of single ospf
deamon(router ospf <cr>) will continue to work as is, including all the
show commands etc.
To enable multiple instances, do the following:
1. service quagga stop
2. Modify /etc/quagga/daemons to add instance-ids of each desired
instance in the following format:
ospfd=“yes"
ospfd_instances="1,2,3"
assuming you want to enable 3 instances with those instance ids.
3. Create corresponding ospfd config files as ospfd-1.conf, ospfd-2.conf
and ospfd-3.conf.
4. service quagga start/restart
5. Verify that the deamons are started as expected. You should see
ospfd started with -n <instance-id> option.
ps –ef | grep quagga
With that /var/run/quagga/ should have ospfd-<instance-id>.pid and
ospfd-<instance-id>/vty to each instance.
6. vtysh to work with instances as you would with any other deamons.
7. Overall most quagga semantics are the same working with the instance
deamon, like it is for any other daemon.
NOTE:
To safeguard against errors leading to too many processes getting invoked,
a hard limit on number of instance-ids is in place, currently its 5.
Allowed instance-id range is <1-65535>
Once daemons are up, show running from vtysh should show the instance-id
of each daemon as 'router ospf <instance-id>’ (without needing explicit
configuration)
Instance-id can not be changed via vtysh, other router ospf configuration
is allowed as before.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
- Schedule write thread for advertisements and withdraws only if corresponding
FIFOs are growing and/or upon work_queue getting fully processed.
- Set non-default yield time for the main work_queue, as the default value
of 10ms results in yielding after processing very few nodes.
- Remove unnecessary scheduling of write thread when update packet is formed.
- If MRAI is 0, don't start a timer unnecessarily, directly schedule write
thread.
- Some debugs.
Credit
------
A huge amount of credit for this patch goes to Piotr Chytla for
their 'route tags support' patch that was submitted to quagga-dev
in June 2007.
Documentation
-------------
All ipv4 and ipv6 static route commands now have a "tag" option
which allows the user to set a tag between 1 and 65535.
quagga(config)# ip route 1.1.1.1/32 10.1.1.1 tag ?
<1-65535> Tag value
quagga(config)# ip route 1.1.1.1/32 10.1.1.1 tag 40
quagga(config)#
quagga# show ip route 1.1.1.1/32
Routing entry for 1.1.1.1/32
Known via "static", distance 1, metric 0, tag 40, best
* 10.1.1.1, via swp1
quagga#
The route-map parser supports matching on tags and setting tags
!
route-map MATCH_TAG_18 permit 10
match tag 18
!
!
route-map SET_TAG_22 permit 10
set tag 22
!
BGP and OSPF support:
- matching on tags when redistribing routes from the RIB into BGP/OSPF.
- setting tags when redistribing routes from the RIB into BGP/OSPF.
BGP also supports setting a tag via a table-map, when installing BGP
routes into the RIB.
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
BGP: Add match interface support to BGP route-map.
Currently, BGP route maps don't support interface match. This is a problem
for commands such as redistribite connected that cannot exclude routes from
specific interfaces (such as mgmt interfaces).
BGP: Event-driven route announcement taking into account min route advertisement interval
ISSUE
BGP starts the routeadv timer (peer->t_routeadv) to expire in 1 sec
when a peer is established. From then on, the timer expires
periodically based on the configured MRAI value (default: 30sec for
EBGP, 5sec for IBGP). At the expiry, the write thread is triggered
that takes the routes from peer's sync FIFO (adj-rib-out) and sends
UPDATEs. This has a few drawbacks:
(1) Delay in new route announcement: Even when the last UPDATE message
was sent a while back, the next route change will necessarily have
to wait for routeadv expiry
(2) CPU usage: The timer is always armed. If the operator chooses to
configure a lower value of MRAI (zero second is a preferred choice
in many deployments) for better convergence, it leads to high CPU
usage for BGP process, even at the times of no network churn.
PATCH
Make the route advertisement event-driven - When routes are added to
peer's sync FIFO, check if the routeadv timer needs to be adjusted (or
started). Conversely, do not arm the routeadv timer unconditionally.
The patch also addresses route announcements during read-only mode
(update-delay). During read-only mode operation, the routeadv timer
is not started. When BGP comes out of read-only mode and all the
routes are processed, the timer is started for all peers with zero
expiry, so that the UPDATEs can be sent all at once. This leads to
(near-)optimal UPDATE packing.
Finally, the patch makes the "max # packets to write to peer socket at
a time" configurable. Currently it is hard-coded to 10. The command is
at the top router-bgp mode and is called "write-quanta <number>". It
is a useful convergence parameter to tweak.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
COMMAND:
table-map <route-map-name>
DESCRIPTION:
This feature is used to apply a route-map on route updates from BGP to Zebra.
All the applicable match operations are allowed, such as match on prefix,
next-hop, communities, etc. Set operations for this attach-point are limited
to metric and next-hop only. Any operation of this feature does not affect
BGPs internal RIB.
Supported for ipv4 and ipv6 address families. It works on multi-paths as well,
however, metric setting is based on the best-path only.
IMPLEMENTATION NOTES:
The route-map application at this point is not supposed to modify any of BGP
route's attributes (anything in bgp_info for that matter). To achieve that,
creating a copy of the bgp_attr was inevitable. Implementation tries to keep
the memory footprint low, code comments do point out the rationale behind a
few choices made.
bgp_zebra_announce() was already a big routine, adding this feature would
extend it further. Patch has created a few smaller routines/macros whereever
possible to keep the size of the routine in check without compromising on the
readability of the code/flow inside this routine.
For updating a partially filtered route (with its nexthops), BGP to Zebra
replacement semantic of the next-hops serves the purpose well. However, with
this patch there could be some redundant withdraws each time BGP announces a
route thats (all the nexthops) gets denied by the route-map application.
Handling of this case could be optimized by keeping state with the prefix and
the nexthops in BGP. The patch doesn't optimizing that case, as even with the
redundant withdraws the total number of updates to zebra are still be capped
by the total number of routes in the table.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
quagga: nexthop-tracking.patch
Add next hop tracking support to Quagga. Complete documentation in doc/next-hop-tracking.txt.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
COMMAND:
'update-delay <max-delay in seconds> [<establish-wait in seconds>]'
DESCRIPTION:
This feature is used to enable read-only mode on BGP process restart or when
BGP process is cleared using 'clear ip bgp *'. When applicable, read-only mode
would begin as soon as the first peer reaches Established state and a timer
for <max-delay> seconds is started.
During this mode BGP doesn't run any best-path or generate any updates to its
peers. This mode continues until:
1. All the configured peers, except the shutdown peers, have sent explicit EOR
(End-Of-RIB) or an implicit-EOR. The first keep-alive after BGP has reached
Established is considered an implicit-EOR.
If the <establish-wait> optional value is given, then BGP will wait for
peers to reach establish from the begining of the update-delay till the
establish-wait period is over, i.e. the minimum set of established peers for
which EOR is expected would be peers established during the establish-wait
window, not necessarily all the configured neighbors.
2. max-delay period is over.
On hitting any of the above two conditions, BGP resumes the decision process
and generates updates to its peers.
Default <max-delay> is 0, i.e. the feature is off by default.
This feature can be useful in reducing CPU/network used as BGP restarts/clears.
Particularly useful in the topologies where BGP learns a prefix from many peers.
Intermediate bestpaths are possible for the same prefix as peers get established
and start receiving updates at different times. This feature should offer a
value-add if the network has a high number of such prefixes.
IMPLEMENTATION OBJECTIVES:
Given this is an optional feature, minimized the code-churn. Used existing
constructs wherever possible (existing queue-plug/unplug were used to achieve
delay and resume of best-paths/update-generation). As a result, no new
data-structure(s) had to be defined and allocated. When the feature is disabled,
the new node is not exercised for the most part.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Dinesh Dutt <ddutt@cumulusnetworks.com>