This crash occurs only with netns implementation.
vrf meaning is different regarging its implementation (netns or
vrf-lite)
- With vrf-lite implementation vrf is a property of the interface that
can be changed as the speed or the state (iproute2 command: "ip link
set dev IF_NAME master VRF_NAME"). All interfaces of the system are in
the same netns and so interface name is unique.
- With netns implementation vrf is a characteristic of the interface
that CANNOT be changed: it is the id of the netns where the interface
is located. To change the vrf of an interface (iproute2 command to
move an interface "ip netns exec VRF_NAME1 ip link set dev IF_NAME
netns VRF_NAME2") the interface is deleted from the old vrf and
created in the new vrf.
Interface name is not unique, the same name can be present in the
different netns (typically the lo interface) and search of interface
must be done by the tuple (interface name, netns id).
Current tests on the vrf implementation (vrf-lite or netns) are not
sufficient. In some cases (for example when an interface is moved from
a vrf X to the default vrf and then move back to VRF X) we can have a
corruption message and then a crash of zebra.
To avoid this corruption test on the vrf implementation, needed when an
interface changes, has been rewritten:
- For all interface changes except deletion the if_get_by_name function,
that checks if an interface exists and creates or updates it if
needed, is changed:
* The vrf-lite implementation is unchanged: search of the interface
is based only on the name and update the vrf-id if needed.
* The netns implementation search of the interface is based on the
(name, vrf-id) tuple and interface is created if not found, the
vrf-id is never updated.
- deletion of an interface (reception of a RTM_DELLINK netlink message):
* The vrf-lite implementation is unchanged: the interface
information are cleared and the interface is moved to the default
vrf if it does not belong to (to allow vrf deletion)
* The netns implementation is changed: only the interface
information are cleared and the interface stays in its vrf to
avoid conflict with interface with the same name in the default
vrf.
This implementation reverts (partially or totally):
commit 393ec5424e ("zebra: fix missing node attribute set in ifp")
commit e9e9b1150f ("lib: create interface even if name is the same")
commit 9373219c67 ("zebra: improve logs when replacing interface to an
other netns")
Fixes: b53686c52a ("zebra: delete interface that disappeared")
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
To correct potential crash with netns implementation of vrf (see next
commit) it is necessary to allow any daemons to know the vrf
implementation whatever the vrf.
With current implementation the daemons do not know the vrf
implementation for the default vrf. For this vrf the returned vrf
implementation is always vrf-lite.
To solve this issue a netns name is set to the default vrf to just test
is presence to know the used implementation.
For zebra a netns name (if needed) is set in the vrf_init function just
before enabling the vrf. So this information is propagated to the other
daemons thanks the zapi message called when the vrf is enable at zebra
layer and override the default configuration (vrf-lite) of the daemon.
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
stdatomic.h does not have aliases for all of the useful gcc
atomic primitives; add them in for that path through
frratomic.h.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
We have the fetch_and_xxx apis, which return the _old_ value;
adding the xxx_and_fetch versions, which return the new value.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
There exists a possibility that the ifindex we are passed
does not exist and as such we should check for it not
resolving as part of the debug.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It is possible to dynamically change default VRF name, if vrf backend is
a netns backend. By creating a link to the default netns in
/var/run/netns folder, then the file name will be used to name the
default VRF. If no backend netns is chosen, it is explained that it is
still possible to statically configure the default vrf name to new
define.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If default VRF is used, with standard naming convention,
memory allocation can be avoided.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Prevent from creating vrf, if the default vrf name is the same as the
vrf to be created.
Also, prevent at startup from creating default vrf with a name already
used in vrf list.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
For the daemons that do not use vrf_init(), the call to the define
will return a default vrf if no other values has been overriden.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Because a VRF name can be used for default VRF, or an alias of an
already created VRF can be passed as parameter, the default VRF name
must be found out. This avoids creating double BGP instances for
example.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In the case the default netns has a netns path, then a new NETNS
creation will be bypassed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The Vrf aliases can be known with a specific hook. That hook will then,
from zebra propagate the information to the relevant zapi clients.
The registration hook function is the same for all daemons.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The get API is used each time the VRF_DEFAULT_NAME macro is used.
The set API is not yet used.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
End user was seeing this debug but we are not giving
the user enough information to debug this on his own.
Add a tiny bit of extra information that could point
the user to solving the problem for themselves.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a header to cleanup no declaration and properly
wrapper some variables to appropriate #ifdef.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were ignoring mpls labels encapped with static routes.
Added support for single and multipath labels.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
watchfrr needs to handle a SIGCHLD also when it calls a global restart
command. Before this patch, it would lead to the following behavior:
15:44:28: zebra state -> down : unexpected read error: Connection reset by peer
15:44:33: Forked background command [pid 6392]: /usr/sbin/frr.init watchrestart all
15:44:53: Warning: restart all child process 6392 still running after 20 seconds, sending signal 15
15:44:53: waitpid returned status for an unknown child process 6392
15:44:53: background (unknown) process 6392 terminated due to signal 15
15:45:13: Warning: restart all child process 6392 still running after 40 seconds, sending signal 9
15:45:33: Warning: restart all child process 6392 still running after 60 seconds, sending signal 9
15:45:53: Warning: restart all child process 6392 still running after 80 seconds, sending signal 9
15:46:13: Warning: restart all child process 6392 still running after 100 seconds, sending signal 9
15:46:33: Warning: restart all child process 6392 still running after 120 seconds, sending signal 9
15:46:53: Warning: restart all child process 6392 still running after 140 seconds, sending signal 9
This is obviously incorrect and can be fixed by comparing the pid to
the global restart object as well.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
When we add / remove a nexthop that we need to track,
keep track of the number of times we have done this
for each nexthop. Consequently keep track of the
number of available nexthops, so that we can
just install new routes when we get one
that uses a pre-existing nexthop. Deletion of
nexthops is done on refcount going to 0.
Removal of routes is handled elsewhere for removal.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The code prior to this change, was allowing clients to register
for nexthop tracking. Then zebra would look up the rnh and
send to that particular client any known data. Additionally
zebra was blindly re-evaluating the rnh for every registration.
This leads to interesting behavior in that all people registered
for that nexthop will get callbacks even if nothing changes.
Modify the code to know if we have evaluated the rnh or not
and if so limit the re-evaluation to when absolutely necessary
This is of particular importance to do because of nht callbacks
for protocols cause those protocols to do not insignificant
work and as more protocols are registering for nht callbacks
we will cause more work than is necessary.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Linux 2.6.0 was released in December of 2003... I'm pretty sure we don't
need this Linux 2.4 support anymore.
Signed-off-by: David Lamparter <equinox@diac24.net>
These MIB OIDs were only used to identify clients on the SMUX protocol.
And even for that, they were essentially pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>