Routes without nexthops don't make any sense, so we need to reject
them otherwise weird things can happen.
NOTE: blackhole routes aren't nexthop-less, they do have a single
nexthop of type NEXTHOP_TYPE_BLACKHOLE.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Some daemons like ospfd and isisd have the ability to advertise a
default route to their peers only if one exists in the RIB. This
is what the "default-information originate" commands do when used
without the "always" parameter.
For that to work, these daemons use the ZEBRA_REDISTRIBUTE_DEFAULT_ADD
message to request default route information to zebra. The problem
is that this message didn't have an AFI parameter, so a default route
from any address-family would satisfy the requests from both daemons
(e.g. ::/0 would trigger ospfd to advertise a default route to its
peers, and 0.0.0.0/0 would trigger isisd to advertise a default route
to its IPv6 peers).
Fix this by adding an AFI parameter to the
ZEBRA_REDISTRIBUTE_DEFAULT_{ADD,DELETE} messages and making the
corresponding code changes.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Future commits are going to introduce more rigor in
state setting in the case of received results from
the data plane. So let us move the DPLANE_OP_ROUTE_DELETE
state check to the same spot as the rest of the code that
is handling a particular operation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify the status flag from 8 bits to 32 bits and to add
a few new flags that will be used in future commits.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify the meta_queue insertion such that we only enqueue
the route_node into one meta_queue instead of several.
Suppose we have multiple route_entries associated with
a particular node from rip, bgp, staticd. If we receive a
route update from rip, we would enqueue the route_node into
the 1, 2, 3 meta-nodes. Which means that we would run
the entire process of figuring out a route 3 times, while
nothing would change the second two times.
Modify the code to choose the lowest meta-queue and
install it into that one for processing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When a dataplane provider/plugin registers, return the new
handle/object - that's needed to use some provider apis
later on.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Pass lists of results back to zebra from the dataplane subsystem
(and pthread). This helps reduce the lock/unlock cycles when
zebra is busy. Also remove a couple of typedefs that made their
way into the dataplane header file - those violate the FRR style
guidelines.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
if the default vrf name is manually set, by passing -o parameter to
zebra, then this should be detected when walking the list of netns
available in the system. If a netns called vrf0 is present, then it
should be ignored.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when zebra is run, by using vrf netns backend mode, then the parser
detector of netns is run before forcing the default vrf to a possible
value. In that case, there is a possibility that the forced '-o' option
will create a second vrf with same name, whereas this option should be
there to uniquely have a default vrf with a value.
To make things consistent, the forced value will be priorised. Then, the
notifier will attempt to create vrf contexts. The expectation is that
the creation will fail, due to an already present vrf with same name.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When an empty netmask a wrong end size is calculated, lets handle this
corner case to avoid spurious warning messages.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Handle corner case where a warning log message is issued on interface
address netmask handling with sockaddr type AF_LINK: it may come empty
or with match all (all 0xFF).
In the first case all lengths are zero and we only need to copy the
first bytes, second case it comes with a zero index and all 0xFF bytes.
In any case we only need to figure out a few of the first bytes instead
of all data.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When porting routing socket macro data handling to functions, the
attribute function was forgotten. The only difference between the
attribute and address handler is the family type check.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
"brief" output for "show interface" helps when we have to quickly check
important information like ip address, vrf etc. This prints
information in the easy to read tabular format. Currently it prints oper
status, ifname, vrf, ipv4 and ipv6 addresses.
Ticket: CM-9109
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
For neigh check duplicate flag as it can be inherited from
duplicate detected MAC (count could be 0).
Ticket:CM-23316
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Below are cases where EVPN duplicate detection
Freeze and Unfreeze required fixes:
Auto recovery needs to check neighbor's duplicate flag
to take action, as neigh could be marked duplicate
via inherited from MAC where IP detection count could be 0.
MAC duplicate detection needs to set flag to true
if freeze action is configured.
Local MAC add update should not send update to bgp
if MAC is in frozen state.
Remote MAC-IP update should not process neigh update if MAC
is detected as duplicate during remote update.
Ticket:CM-23344
Testing Done:
Trigger duplicate detection via both local and remote update trigger,
Validate clear command and other changes expected behavior.
Auto-recovery takes appropriate action on inherited IPs.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add the ability to retrieve the current role of mlag for this machine.
If mlag is not setup we will always return MLAG_ROLE_NONE.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The zebra_delete_rnh function is not needed to be exposed
to the entire world. Limit it's scope.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The deletion of a rnh is always proceeded by the same checks
to see if it is done. Just let zebra_delete_rnh do this test.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we call zebra_vrf_table_create, we've already created the info
pointer in zebra_router_get_table, so properly set the info->safi
and just store the zvrf->table[afi][safi] value.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When handling events from /var/run/netns folder, if several netns are
removed at the same time, only the first one is deleted in the frr. Fix
this behaviour by applying continue in the loop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Duplicate address detection should operate
at default vrf instance.
For mac and neigh show command, auto recovery and few places
where tanent vrf_id used for zvrf instead use default
vrf instance. Use vxlan_if's or VRF_DEFAULT vrf_id to
fetch zebra's default vrf instance.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
zebra uses the SIOCETHTOOL ioctl with the ETHTOOL_GSET command to
fetch the speed of interfaces from the kernel. The only problem is
that ETHTOOL_GSET returns EOPNOTSUPP when the given interface is a
virtual interface. This leads to zebra emitting warnings like this
at startup:
ZEBRA: IOCTL failure to read interface lo speed: 95 Operation not supported
ZEBRA: IOCTL failure to read interface dummy0 speed: 95 Operation not supported
ZEBRA: IOCTL failure to read interface ovs-system speed: 95 Operation not supported
Silence these warnings by ignoring EOPNOTSUPP errors, since we know
they are harmless. This is similar to how we handle EINVAL errors
from the BSD SIOCGIFMEDIA ioctl (commit c69f2c1ff).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Unlike the other interface zapi messages, ZEBRA_INTERFACE_VRF_UPDATE
identifies interfaces using ifindexes and not interface names. This
is a problem because zebra always sends ZEBRA_INTERFACE_DOWN
and ZEBRA_INTERFACE_DELETE messages before sending
ZEBRA_INTERFACE_VRF_UPDATE, and the ZEBRA_INTERFACE_DELETE callback
from all daemons set the interface index to IFINDEX_INTERNAL. Hence,
when decoding a ZEBRA_INTERFACE_VRF_UPDATE message, the interface
lookup would always fail since the corresponding interface lost
its ifindex. Example (ospfd):
OSPF: Zebra: Interface[rt1-eth2] state change to down.
OSPF: Zebra: interface delete rt1-eth2 vrf default[0] index 8 flags 11143 metric 0 mtu 1500
OSPF: [EC 100663301] INTERFACE_VRF_UPDATE: Cannot find IF 8 in VRF 0
To fix this problem, use interface names instead of ifindexes to
indentify interfaces like the other interface zapi messages do.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The route_info data structure already had a mapping of route type
to admin distance. Consolidate the meta_queue_map information
into this route_info data structure. This is to reduce the number
of places we need to remember to touch when adding a new routing
protocol.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
An EVPN type-2 entry is in freeze state during remote update,
remote VTEP can send typ-2 withdraw update,
upon receiving an entry delete (withdraw), first check
kernel has in local reachable state. Upon
unfreeze use the local entry to advertise to peers.
Fetch is for both MAC and IP, delete can come for
only MAC or MAC-IP combined route.
The specific entry fetch only required request flag to be set,
dump flag is not required.
Testing Done:
Simulate two VTEPs to do M1, IP1 mobility sequence,
freeze MAC during remote MAC update, subsequently send
withdraw type-2 route from origintating VTEP.
This results in read apis to invoke for local reachable entry.
Zebra updates its cache and upon unfreeze originates type-2.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Make netlink_request api generic where it can be used
for dump or querying specific information request.
nelink request nlm flags (NLM_F_ROOT | NLM_F_MATCH) are
used to dump purpose, if client wants to query spcific
MAC or IP using netlink_request does not require to set
them.
nlm struct is passed by the caller of netlink_request,
it can also set the nlm request flags.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit is the last missing piece to complete BGP LU support in bgpd. To this moment, bgpd (and zebra) supported auto label assignment only for prefixes leaked from VRFs to vpn and for MPLS SR prefixes. This adds auto label assignment to other routes types in bgpd. The following enhancements have been made:
* bgp_route.c:bgp_process_main_one() now sets implicit-null local_label to all local, aggregate and redistributed routes.
* bgp_route.c:bgp_process_main_one() now will request a label from the label pool for any prefix that loses the label for some reason (for example, when the static label assignment config is removed)
* bgp_label.c:bgp_reg_dereg_for_label() now requests labels from label pool for routes which have no associated label index
* zebra_mpls.c:zebra_mpls_fec_register() now expects both label and label_index from the calling function, one of which must be set to MPLS_INVALID_LABEL or MPLS_INVALID_LABEL_INDEX, based on this it will decide how to register the provided FEC.
Signed-off-by: Anton Degtyarev <anton@cumulusnetworks.com>
Reduce the zebra rib workqueue retry timeout, used when the queue
towards the zebra dataplane has reached its limit. Lowering the
value was reported to improve update throughput on some platforms.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The label processing for socket installs was not ensuring
that each nexthop would not accidently use the last
nexthops value.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The test we were using to ensure that a mask was sent in
is a bit redundant, let's just always send it in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ADD/DELETE messages are the only ones we support, so leave
early from the function, in other words don't check it every
nexthop loop.
Additionally nexthops only care about non recursive active flags.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I'm going to rearrage the kernel_rtm_ipv4 and v6 functions
so the sin6_masklen needs to be moved a bit earlier.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The write function converted to v4 and v6 functions to a union sockunion
via casting. Just use `union sockunion` instead.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow the ns deletion event to happen *after* the data validity
checks.
Please note this probably still leaves a weird hole if we receive
multiple namespace events ( as the for loop implies ). We will
stop handling anything after a namespace deletion notification.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the default vrf name was hardset to "Default", whereas the default vrf
name could have been configured in an other manner. Fix this
inconsistency.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the l3vni structure is allocated only once, since that structure is only
used for default netns. For that, move the initialisation part is moved
to a proper place, where there is no risk of attempting to initialise it
more than once, even when vrf backend is netns.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a route removal failure happens return to the installing
protocol that the route deletion failed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the zebra rib processing workqueue, set a small timeout
so that we will wait a short time if the queue into the
async dataplane is full. This helps avoid a situation where
the zebra main pthread constantly retries rib work without
giving the dataplane pthread a chance to make progress.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
NEXTHOP_FLAG_ACTIVE currently means that the nexthop is considered
good enough to be installed. With current ecmp restrictions this
translation from multipath_num is enforced in the data plane.
The problem with this is of course that every data plane now
becomes concerned about the multipath num and must enforce it
independently. Currently *bsd does not honor multipath_num at
all and linux marks all nexthops as being installed even when
it honors a multipath_num that is less than the total.
This code change moves the multipath_num enforcement from a dataplane
decision to a zebra nexthop decision. Thus dataplanes now can
just install those nexthops marked as NEXTHOP_FLAG_ACTIVE
without having to worry about multipath_num.
*BSD will now respect multipath_num and Linux now properly notes
which routes are actually installed or not:
sharpd@donna ~/f/t/topotests> ps -ef | grep frr
frr 6261 1556 0 09:12 ? 00:00:00 /usr/lib/frr/zebra -e 2 --daemon -A 127.0.0.1
frr 6279 1556 0 09:12 ? 00:00:00 /usr/lib/frr/staticd --daemon -A 127.0.0.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/106] via 10.0.2.2, enp0s3, 00:00:45
S>* 4.4.4.4/32 [1/0] via 10.0.2.1, enp0s3, 00:00:02
* via 192.168.209.1, enp0s8, 00:00:02
via 192.168.210.1, enp0s9 inactive, 00:00:02
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:45
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:45
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:45
donna.cumulusnetworks.com(config)#
sharpd@donna ~/f/t/topotests> ip route show
default via 10.0.2.2 dev enp0s3 proto dhcp metric 106
4.4.4.4 proto 196 metric 20
nexthop via 10.0.2.1 dev enp0s3 weight 1
nexthop via 192.168.209.1 dev enp0s8 weight 1
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 106
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
192.168.209.0/24 dev enp0s8 proto kernel scope link src 192.168.209.2 metric 105
192.168.210.0/24 dev enp0s9 proto kernel scope link src 192.168.210.2 metric 103
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Stop creating individual, one-time events as each batch of
incoming zserv/zapi messages is processed - use a singleton
event so that the incoming message activity is more fair if
the zebra main pthread has other events to run.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
We never used this information and it was merely stored.
Additionally this is not something that is a flag, it's
a status.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Make the v4 and v6 code paths for rib_XXX calls in kernel_socket
as similiar as we can possibly make them. There is no need
for code duplication at this point in time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The rib_lookup_ipv4_route function is only used in a debug path.
Is only used for v4 and only checks to make sure that the rib
and fib are in sync( which is not needed/used/supported on other
platforms ). So let's just remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For nexthop handling use the actual resolved nexthop.
Nexthops are stored as a `special` list:
Suppose we have 3 way ecmp A, B, C:
nhop A -> resolves to nhop D
|
nhop B
|
nhop C -> resolves to nhop E
A and C are typically NEXTHOP_TYPE_IPV4( or 6 ) if they recursively resolve
We do not necessarily store the ifindex that this resolves to.
Current nexthop code only loops over A,B and C and uses those for
the zebra_rnh.c handling. So interested parties might receive non-fully
resolved nexthops( and they assume they are! ).
Let's convert the looping to go over all nexthops and only deal with
the resolved ones, so we will look at and use D,B and E.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `show ip route A.B.C.D json` command was only displaying
the last route entry looked at and we would drop the data
associated with other route entries. This fixes the issue:
robot# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/100] via 192.168.201.1, enp3s0, 00:13:31
C>* 4.50.50.50/32 is directly connected, lo, 00:13:31
D 10.0.0.1/32 [150/0] via 192.168.201.1, enp3s0, 00:09:46
S>* 10.0.0.1/32 [1/0] via 192.168.201.1, enp3s0, 00:10:04
C>* 192.168.201.0/24 is directly connected, enp3s0, 00:13:31
robot# show ip route 10.0.0.1 json
{
"10.0.0.1\/32":[
{
"prefix":"10.0.0.1\/32",
"protocol":"sharp",
"distance":150,
"metric":0,
"internalStatus":0,
"internalFlags":1,
"uptime":"00:09:50",
"nexthops":[
{
"flags":1,
"ip":"192.168.201.1",
"afi":"ipv4",
"interfaceIndex":2,
"interfaceName":"enp3s0",
"active":true
}
]
},
{
"prefix":"10.0.0.1\/32",
"protocol":"static",
"selected":true,
"distance":1,
"metric":0,
"internalStatus":0,
"internalFlags":2064,
"uptime":"00:10:08",
"nexthops":[
{
"flags":3,
"fib":true,
"ip":"192.168.201.1",
"afi":"ipv4",
"interfaceIndex":2,
"interfaceName":"enp3s0",
"active":true
}
]
}
]
}
robot#
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When using `SIOCGIFMEDIA` check for `EINVAL`, otherwise we might print
an error message on an unsupported interface.
FreeBSD source code reference:
https://github.com/freebsd/freebsd/blob/master/sys/net/if_media.c#L300
And:
8cb4b0c018/usr.sbin/rtsold/if.c (L211)
/*
* EINVAL simply means that the interface does not support
* the SIOCGIFMEDIA ioctl. We regard it alive.
*/
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Some address types were not being skipped triggering a warning log
message, so lets refactor this code to properly handle known and unknown
types.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Move the declaration of ROUNDUP and ROUND_TYPE to outside of
`ifdef SA_SIZE`. We'll use these definitions in the next commit.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Clear dup address vni needs to return non-zero value
in case of command is not successful.
Ticket:CM-23122
Testing Done:
run clear command and check upon failure return code is non-zero.
root@TORS1:~# vtysh -c "clear evpn dup-addr vni 1000 ip 45.0.1.26"
% Requested IP's associated MAC 00:01:02:03:04:05 is still in duplicate
% state
root@TORS1:~# echo $?
1
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Problem reported that kernel neighbor entries could end up in "FAILED"
state when the neighbor entry was deleted. This fix handles the
notification of the event from netlink messages and re-inserts the
deleted entry.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Always resend the nexthop information when we get a registration
event. Multiple daemons expect this information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
Change helps display detailed output for all possible VNI neighbors
without specifying VNI and ip. It helps in troubleshooting as a single
command can be fired to capture detailed info on all VNIs.
Ticket: CM-22832
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8034
Change helps display detailed output for all possible VNI MACs without
specifying VNI or mac. It helps in troubleshooting - a single
command can be fired to capture detailed info on all VNIs.
Also fixed and existing json related bug where json object is created by
a parent function and freed in child function.
Ticket: CM-22832
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8028
A while ago all FRR configuration commands were converted to use the
QOBJ infrastructure to keep track of configuration objects. This
means the configuration lock isn't necessary anymore because the
QOBJ code detects when someones tries to edit a configuration object
that was deleted and react accordingly (log an error and abort the
command). The possibility of accessing dangling pointers doesn't
exist anymore since vty->index was removed.
Summary of the changes:
* remove the configuration lock and the vty_config_lockless() function.
* rename vty_config_unlock() to vty_config_exit() since we need to
clean up a few things when exiting from the configuration mode.
* rename vty_config_lock() to vty_config_enter() to remove code
duplication that existed between the three different "configuration"
commands (terminal, private and exclusive).
Configuration commands converted to the new northbound model don't
need the configuration lock either since the northbound API also
detects when someone tries to edit a configuration object that
doesn't exist anymore.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
the vrf context was not created at previous location of the call.
The call is done after vrf initialisation.
PR=61513
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Acked-by: Nicolas dichtel <nicolas.dichtel@6wind.com>
the netns discovery process executed when vrf backend is netns, allows
the zebra daemon to dynamically change the default vrf name value. This
option is disabled, when the zebra is forced to a default vrf value with
option -o.
PR=61513
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Change helps display detailed output for all possible VNIs without
specifying VNI. It helps in troubleshooting - a single command can
be fired to capture detailed info on all VNIs.
Ticket: CM-22831
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8013
To avoid conflicts between the zebra main pthread and the
dataplane pthread, use a separate routing socket (on non-netlink
platforms) for dataplane route updates to the OS.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use a separate netlink socket for the dataplane's updates, to
avoid races between the dataplane pthread and the zebra main
pthread. Revise zebra shutdown so that the dataplane netlink
socket is cleaned-up later, after all shutdown-time dataplane
work has been done.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Improve, simplify dataplane provider locking apis. Add accessor
for dataplane pthread's thread_master, for use by providers who
need to use the thread/event apis.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Update the dataplane shutdown checks to include the providers.
Also revise the typedef for provider structs to make const
work.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Limit the number of updates processed from the incoming queue;
add more stats. Fill out apis for dataplane providers; convert
route update processing to provider model; move dataplane
status enum
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Display following Per MAC and Neigh's output:
If duplicate address detection is under process,
display detection start time and detection count.
If duplicate address detection detected an address
as duplicate, display detection time and duplicate
status.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The if_is_loopback() function is the right abstraction for identifying
loopback interfaces. There should be no reason for not using it in the
router-id code.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
It's been a year since we added the new optional parameters
to instantiation. Let's switch over to the new name.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).
Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
will NOT be accepted if the remote seq was not reset in the previous step.
Ticket: CM-22707
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors
The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh
The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-22700
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.
At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.
To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.
Ticket: CM-22687
Reviewed By: CCR-7935
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Current clang has an issue with the pointer/target argument
to at least one atomic/intrinsic. A variable with '_Atomic'
generates a compile-time error. Use a cast as a workaround
here to allow use of clang for now.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When the rib code is informed that a table is closing/
going away, only try once to uninstall associated routes from
the fib/dataplane. The close path can be called multiple times
in some cases - zebra shutdown, e.g.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Even if the neighbor entry we want already exists, force its
reinstallation to ensure that it's valid. This will now take place when
we request an update of the neighbor entry.
Ticket: CM-22604
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Recursive multipath nexthops were broken by the initial async
dataplane - we were trying to install an extra, invalid
nexthop.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The interface type can be a bond or a bond slave, add some
code to note this and to display it as part of a show interface
command.
Signed-off-by: Dinesh Dutt <didutt@gmail.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Correctly set safi to prevent duplicate allocations
* Free previously allocated table->info before overwriting it
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The frr-interface YANG module models interfaces using a YANG list keyed
by the interface name and the interface VRF. Interfaces can't be keyed
only by their name since interface names might not be globally unique
when the netns VRF backend is in use. When using the VRF-Lite backend,
however, interface names *must* be globally unique. In this case, we need
to validate the uniqueness of interface names inside the appropriate
northbound callback since this constraint can't be expressed in the
YANG language. We must also ensure that only inactive interfaces can be
removed, among other things we need to validate in the northbound layer.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Introduce frr-interface.yang, which defines a model for managing FRR
interfaces.
Update the 'frr_yang_module_info' array of all daemons that will
implement this module.
Add automatically generated stub callbacks in if.c. These callbacks will
be implemented in the following commit.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
FRR_DAEMON_INFO should now contain an array of 'frr_yang_module_info'
structures describing the YANG modules implemented by the daemon.
This array will be used by frr_init() function to load all YANG modules
and initialize the northbound callbacks during the daemon initialization.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Avoid running the shutdown/sigint handler code more than once. With
the async dataplane, once shutdown has been initiated, the completion
of all async updates triggers final shutdown of the zebra main
pthread. During that time, avoid taking and processing a second
signal, such as SIGINT or SIGTERM.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Impose a configurable limit on the number of route updates
that can be queued towards the dataplane subsystem.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Dplane support for zebra's route cleanup during shutdown (clean
shutdown via SIGINT, anyway.) The dplane has the opportunity to
process incoming updates, and then triggers final cleanup
in zebra's main thread.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add first pass at show commands for the zebra dplane. Add some stats
counters to show. Start prep for correct shutdown processing, and for
multiple providers.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Correct use of netlink_parse_info() in the netlink fuzzing path.
Also clarify a couple of comments about pthreads.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
We need a bit of special handling for system routes, which need
to be offered for redistribution even though they won't be
passing through the dplane system.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Initial WIP api to add providers into the zebra dataplane system,
with some simple ordering/prioritization.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Set SELECTED re immediately in rib_process, without expecting
that fib install has completed. Remove premature redistribute
call also.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Slide netlink_talk_info into place to remove dependency on core
zebra structs; add accessors for dplane context block
Start init of route context from zebra core re and rn structs;
start queueing and event handling for incoming route updates.
Expose netlink apis that don't rely on zebra core structs;
add parallel route-update code path using the dplane ctx;
simplest possible event loop to process queued route'
updates.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When we fail to install a route into bsd, note the case
where we have no viable nexthops installed for it, so
that we can know in zebra if the route is good or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The _wrap_script inclusion implies a certain end functionality
of which we don't care. We just care that the hooks are called.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
These three data structures belong in the `zebra_router` structure
as that they do not belong in `struct zebra_ns`.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Move the rules_hash to the zrouter data structure and provide
the additional bit of work needed to lookup the rule based upon
the namespace id as well. Make the callers of functions not
care about what namespace id we are in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct zebra_ns` data structure is being used
for both router information as well as support for
the vrf backend( as appropriate ). This is a confusing
state. Start the movement of `struct zebra_ns` into
2 things `struct zebra_router` and `struct zebra_ns`.
In this new regime `struct zebra_router` is purely
for handling data about the router. It has no knowledge
of the underlying representation of the Data Plane.
`struct zebra_ns` becomes a linux specific bit of code
that allows us to handle the vrf backend and is allowed
to have knowledge about underlying data plane constructs.
When someone implements a *bsd backend the zebra_vrf data
structure will need to be abstracted to take advantage of this
instead of relying on zebra_ns.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We had a variety of issues with sorted list compare functions.
This commit identifies and fixes these issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
During a debugging session last night I discovered that I was
still having some `fun` figuring out why zebra was not making
a route's nexthop active. After some debugging I figured out
that I was missing some states that we could end up in that
didn't have debug information about what happened in nexthop_active.
Add the missing breadcrumbs for nexthop resolution. In addition
add a bit of code to notice the ebgp state without recursion turned
on and to let the user know about it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
on some cases, kernel routes are not selected, because the kernel
suppressed it without informing the netlink layer that the route has
been suppressed ( for instance, when an interface goes down, the route
never goes back when interface goes up again). This commit intends to
suppress that entry from zebra.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Allow the modification of whether or not we will allow
BUM flooding on the vxlan bridge. To do this allow
the upper level protocol to specify via the ZEBRA_VXLAN_FLOOD_CONTROL
zapi message.
If flooding is disabled then BUM traffic will not be forwarded
to other VTEP's.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Work to handle the route-maps, namely the header changes in zebra_vrf.h
and the mapping of using that everywhere
Signed-off-by: vishaldhingra vdhingra@vmware.com
The condition in the do/while is always false because 'return_nsid' cannot
reach the end of the loop with 'return_nsid' having a different value than
NS_UNKNOWN. Because of that, the condition can be replaced with 0 (false).
Also, the loop can be removed because the two assignments made at the end
of the loop before the condition check are not used (detected via Clang,
afterwards).
Signed-off-by: F. Aragon <paco@voltanet.io>
Conditional code in netlink_macfdb_update() introduced in 2232a77c used
the 'dst_present' variable because not all cases were covered. Now it is
not necessary.
Signed-off-by: F. Aragon <paco@voltanet.io>
Wrapper the get/set of the table->info pointer so that
people are not directly accessing this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Unnecesary redeclaration of already-defined enum 'dp_results' removed.
Can be detected via static analysis with e.g.
./configure CFLAGS=-Wgnu-redeclared-enum CC=clang
Signed-off-by: F. Aragon <paco@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
All I can see is an unneccessary complication. If there's some purpose
here it needs to be documented...
Signed-off-by: David Lamparter <equinox@diac24.net>
When we receive a v6 RA packet with an optional
ND_OPT_SOURCE_LINKADDR take that data and construct the
v4 to v6 neighbor entry for that interface to allow
v4 w/ v6 nexthops to work with only global v6 addresses
on an interface.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Abstract the mac neigh installation for 169.254.0.1 into
it's own function that we can pass the mac address into.
This will allow a future commit to use this functionality
when we have the appropriate mac address from reading
optional attributes of a RA packet.
Signed-off-by: Donald Sharp <sharpd@cumuusnetworks.com>
This change makes the zebra acting as label manager proxy not to relay non-LM
messages to clients that a zebra acting in non-proxy mode may send to it. Also,
the existing code does not schedule a rcv in case of relay_response_back
returns -1. This patch re-schedules reads on the socket even in case such a
function returns -1 by calling thread_add_read().
Signed-off-by: F. Aragon <paco@voltanet.io>
Corrections so that the BGP daemon can work with the label manager properly
through a label-manager proxy. Details:
- Correction so the BGP daemon behind a proxy label manager gets the range
correctly (-I added to the BGP daemon, to set the daemon instance id)
- For the BGP case, added an asynchronous label manager connect command so
the labels get recycled in case of a BGP daemon reconnection. With this,
BGPd and LDPd would behave similarly.
Signed-off-by: F. Aragon <paco@voltanet.io>
The block comments from a couple commits were not following
proper style. Fix.
Fix SA warning that had snuck in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Netdevices are not sorted in any fashion by the kernel during the initial
interface nldump. So you can get an upper device (such as an SVI) before
its corresponding lower device (bridge).
To fix this problem we skip resolving link dependencies during handling of
nldump notifications. Resolving instead at the end (when all the devices
are present)
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-22388, CM-21796
Reviewed By: CCR-7845
Testing Done:
1. verified on a setup with missing linkages
2. automation - evpn-min
Ensure that when the is_router condition changes for a locally learnt
neighbor, it is informed to BGP only if it is active i.e., the MAC is
also locally learnt.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22288
Reviewed By: CCR-7832
Testing Done:
1. Failed test
2. vxlan_routing_test.py
Use boolean variables instead of unsigned int for certain VxLAN-EVPN
flags which are really used as boolean.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22288
Reviewed By: CCR-7832
Testing Done:
Along with a subsequent, related commit
When a remote MAC goes away, but there are neighbors referring to it,
ensure that when the last remote neighbor goes away, the MAC is
uninstalled from the kernel and no longer considered as remote.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22130
Reviewed By: CCR-7777
Testing Done:
1. Replicated failed scenario and verified with fix.
2. evpn-min
When a MAC moves from local to remote, a replace is allowed, EVPN
no longer has to delete the local MAC before installing the remote
MAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
So the linux kernel uses the RT_TABLE_MAIN for the table
id used for ip routing. The multicast routing tables use
RT_TABLE_DEFAULT. We changed the internal code of zebra_vrf
a few months back to use RT_TABLE_MAIN as the tableid to
use. This caused the pim sg stats to stop working because
of the kernel bug where it uses a different table
for ip routing and ip multicast.
Put a bit of a special case in to do the right thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When debugging the mroute code path in zebra, add a bit of additional
data to allow us to know what is going on a bit more.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Newer linux kernels apparently send data down the netlink
bus for the creation of mroutes. Add a bit of code
to notice this and to handle it appropriately( ie do
nothing at this point in time ) as that the correct
place to do this is in the pim socket in pimd.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are displaying data about a netlink message
in debugs or errors, print out the message type
as a string instead of a number.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>