Coverity is claiming that bfdd is able got have bglobal.bg_use_dplane
can be true, while dplane_addr can be uninitialized. Not really
possible since global variables are initialized to all 0's. In
any event. Force it to think it can't go there.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
After two single-hop sessions (*no local address are configured*) on two
interfaces are UP, remove one address of one interface, both of them
(actually, quite independent sessions) come to be DOWN, not just one.
Consider two boxes: A with `a1` and `a2` adddress on two interfaces,
and B with `b1` and `b2`.
Two sessions are set up and ok: `s1` with <a1,b1> and `s2` with <a2,b2>.
After `a1` of A is removed, there is an unhappy coincidence:
1) On A: `s1` changes local address, and sends <a2,b1> packets with help
of route.
2) On B: wrongly regarded <a2,b1> packets with non-zero remote descriminator
as part of `s2`, and are dropped for mismatched remote remote descriminator.
3) On A: `s1` sends <a2,b1> packets with zero remote descriminator to
initialize this session.
4) On B: wrongly regarded <a2,b1> packets with zero remote descriminator as
part of `s2`. Then `s2` will vibrate.
So the good sessions are overridden.
In this case, the <a2,b1> packets with zero remote descriminator won't take
effect until the current good sessions become bad.
Since single-hop sessions are allowed to be set without bound inteface in
current code, this commit adds one check in `bfd_recv_cb()` to avoid wrong
override.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Don't rely on the OS interface name length definition and use the FRR
definition instead.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The "local_address" of bfd is only used in `show bfd peers brief`
for single hop sessions which are configured without "local address".
Since it is set by destination address of received packet, not
completely correct, so remove it.
Signed-off-by: ewlumpkin <ewlumpkin@gmail.com>
Signed-off-by: anlan_cs <vic.lan@pica8.com>
With the simple BFD configuration -
(active mode, single hop, without other parameters)
```
!
bfd
peer 11.11.11.11
exit
!
```
The interface with 11.11.11.0/24 is a *virtual* interface,
which can be deleted.
After BFD FSM is created and session is ok, do these things:
1) delete this interface
2) create this interface
3) set same ip address in this interface
Now, everything seems completely restored because all configuration
is same. But bad thing happens, BFD session hang on "down" status -
```
root# show bfd peer 11.11.11.11
BFD Peer:
peer 11.11.11.11 vrf default
ID: 638815827
Remote ID: 0
Active mode
Status: down
Downtime: 3 second(s)
Diagnostics: path down <- caused by destroyed interface
Remote diagnostics: ok
```
With the interface creating, `bfdd_sessions_enable_interface()`
wrongly compares added interface with the created, even key of
this `bfd_session` isn't binded with any interface. So this
`bfd_session` will hang on "down" status for ever.
So skip the compare in this case (no interface in key) to wake up
this `bfd_session`.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
If time ( a uint64_t ) is large enough doing division
and subtraction can still lead to situations where
the resulting number is greater than a uint32_t.
Just use uint32_t as an intermediate storage spot.
This is unlikely to every occur in a time frame
I could possibly care about but makes Coverity happy.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Problem:
One is with active mode, the other is with passive mode. Sometimes
the one with active mode is in `Down` stauts, but the other one
with passive mode is unluckily stuck in `Init` status:
It doesn't answer its peer with any packets, even receiving continuous
`Down` packets.
Root Cause:
bfdd with passive mode answers its peer only *one* packet in `Down` status,
then it enters into `Init` status and ignores subsequent `Down` packets.
Unluckily that *one* answered packet is lost, at that moment its peer
with active mode can only have to send `Down` packets.
Fix:
1) With passive mode, bfdd should start xmittimer after received `Down` packet.
Refer to RFC5880:
"A system taking the Passive role MUST NOT begin sending BFD packets for
a particular session until it has received a BFD packet for that session, and
thus has learned the remote system's discriminator value."
2) Currently this added xmittimer for passive mode can be safely removed
except receiving `AdminDown` packet:
- `bfd_session_enable/bfd_set_passive_mode` doesn't start xmittimer
- `ptm_bfd_sess_dn/bfd_set_shutdown` can remove xmittimer
Per RFC5880, receiving `AdminDown` packet should be also regarded as `Down`,
so just call `ptm_bfd_sess_dn`, which will safely remove the added xmittimer
for passive mode. In summary, call `ptm_bfd_sess_dn` for two status changes
on receiving `AdminDown`: `Init`->`Down` and `Up`->`Down`.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
A few optimizations for bfd NB:
- Remove unuseful checks for parameters with the same values
- Replace checking values of bfd parameters with YANG's "range"
- Append "required-echo-receive-interval" with 0 for it can be disabled
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Since control packets may be dropped by ttl check, the counter
operation should be put after all check including ttl check.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Currently, it is possible to rename the default VRF either by passing
`-o` option to zebra or by creating a file in `/var/run/netns` and
binding it to `/proc/self/ns/net`.
In both cases, only zebra knows about the rename and other daemons learn
about it only after they connect to zebra. This is a problem, because
daemons may read their config before they connect to zebra. To handle
this rename after the config is read, we have some special code in every
single daemon, which is not very bad but not desirable in my opinion.
But things are getting worse when we need to handle this in northbound
layer as we have to manually rewrite the config nodes. This approach is
already hacky, but still works as every daemon handles its own NB
structures. But it is completely incompatible with the central
management daemon architecture we are aiming for, as mgmtd doesn't even
have a connection with zebra to learn from it. And it shouldn't have it,
because operational state changes should never affect configuration.
To solve the problem and simplify the code, I propose to expand the `-o`
option to all daemons. By using the startup option, we let daemons know
about the rename before they read their configs so we don't need any
special code to deal with it. There's an easy way to pass the option to
all daemons by using `frr_global_options` variable.
Unfortunately, the second way of renaming by creating a file in
`/var/run/netns` is incompatible with the new mgmtd architecture.
Theoretically, we could force daemons to read their configs only after
they connect to zebra, but it means adding even more code to handle a
very specific use-case. And anyway this won't work for mgmtd as it
doesn't have a connection with zebra. So I had to remove this option.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When the detection time expires, we put the session down and restart the
timer. As the comment in the code says, it's needed to zero the remote
discriminator after the second expiration.
But the RFC clearly says that this must be done on the first expiration:
bfd.RemoteDiscr
The remote discriminator for this BFD session. This is the
discriminator chosen by the remote system, and is totally opaque
to the local system. This MUST be initialized to zero. If a
period of a Detection Time passes without the receipt of a valid,
authenticated BFD packet from the remote system, this variable
MUST be set to zero.
And we actually already do it in `ptm_bfd_sess_dn`, so there's no need
to reset the timer and wait for it twice.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Per RFC 5880 section 6.8.12, the use of a Poll Sequence is not necessary
when the Detect Multiplier is changed. Currently, we update the Detection
Timeout only when a Poll Sequence is terminated, therefore we ignore the
Detect Multiplier change if it's not accompanied with RX/TX timer change.
To fix the problem, we should update the Detection Timeout on every
received packet.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Since f60a1188 we store a pointer to the VRF in the interface structure.
There's no need anymore to store a separate vrf_id field.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
show/clear DEFUNs always require either peer label or IP address to be
specified, so if `label` is NULL then `peer_str` is definitely not NULL.
But Coverity doesn't know about that, so it complains about possible
NULL dereference of `peer_str`. This commit should make Coverity happy.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
This removes a giant `switch { }` block from lib/zclient.c and
harmonizes all zclient callback function types to be the same (some had
a subset of the args, some had a void return, now they all have
ZAPI_CALLBACK_ARGS and int return.)
Apart from getting rid of the giant switch, this is a minor security
benefit since the function pointers are now in a `const` array, so they
can't be overwritten by e.g. heap overflows for code execution anymore.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Well, there are some weird and duplicated checks there...
All we need is two simple checks:
- VRF existence. We must have it to enable the session.
- Interface existence. If it's configured for the session, we have to
bind the session to the interface.
This commit implements these checks and removes unnecessary duplication.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
We get the pointer to the interface on which the packet was received
right at the beginning of bfd_recv_cb. So let's use this pointer and
don't perform additional interface lookups.
Also explain in more detail how we process VRF id with different
backends.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
To ensure this, add a const modifier to functions' arguments. Would be
great do this initially and avoid this large code change, but better
late than never.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
FRR should only ever use the appropriate THREAD_ON/THREAD_OFF
semantics. This is espacially true for the functions we
end up calling the thread for.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Since there's very few locations where the `frr-format` actually prints
false positive warnings, consensus seems to be to just work around the
false positives even if the code is correct.
In fact, there is only one pattern of false positives currently, in
`bfdd/dplane.c` which does `vty_out("%"PRIu64, (uint64_t)be64toh(...))`.
The workaround/fix for this is a replacement `be64toh` whose type is
always `uint64_t` regardless of what OS we're on, making the cast
unnecessary.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
There is a possibility that the same line can be matched as a command in
some node and its parent node. In this case, when reading the config,
this line is always executed as a command of the child node.
For example, with the following config:
```
router ospf
network 193.168.0.0/16 area 0
!
mpls ldp
discovery hello interval 111
!
```
Line `mpls ldp` is processed as command `mpls ldp-sync` inside the
`router ospf` node. This leads to a complete loss of `mpls ldp` node
configuration.
To eliminate this issue and all possible similar issues, let's print an
explicit "exit" at the end of every node config.
This commit also changes indentation for a couple of existing exit
commands so that all existing commands are on the same level as their
corresponding node-entering commands.
Fixes#9206.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
There's a padding byte between "mhop" and "peer" fields in this structure.
This structure is sometimes passed by value to functions and used in
assignments. The standard doesn't guarantee that the padding bytes are
copied on assignments. As this structure is used as a hash key, having
this padding byte with unspecified value can lead to unwanted behavior.
Fix the possible issue by making the "mhop" field to be 2 bytes. Also
make the struct packed as a precaution for future changes.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Never send an interface name/index for multihop sessions. It breaks
"neighbor A.B.C.D update-source" config in BGP.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Compile with v2.0.0 tag of `libyang2` branch of:
https://github.com/CESNET/libyang
staticd init load time of 10k routes now 6s vs ly1 time of 150s
Signed-off-by: Christian Hopps <chopps@labn.net>
The idea of the "with-defaults" flag is to show the default values for
parameters that were not configured by the user. But bfdd incorrectly
shows the default values for all parameters, including the
user-configured ones.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Most of these are many, many years out of date. All of them vary
randomly in quality. They show up by default in packages where they
aren't really useful now that we use integrated config. Remove them.
The useful ones have been moved to the docs.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
When bfd node is removed, we must clear all NB entries set by its
children - sessions and profiles. Let's store some fake data as an entry
for the bfd node to be able to unset it later.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Back when I put this together in 2015, ISO C11 was still reasonably new
and we couldn't require it just yet. Without ISO C11, there is no
"good" way (only bad hacks) to require a semicolon after a macro that
ends with a function definition. And if you added one anyway, you'd get
"spurious semicolon" warnings on some compilers...
With C11, `_Static_assert()` at the end of a macro will make it so that
the semicolon is properly required, consumed, and not warned about.
Consistently requiring semicolons after "file-level" macros matches
Linux kernel coding style and helps some editors against mis-syntax'ing
these macros.
Signed-off-by: David Lamparter <equinox@diac24.net>
Currently there is a single interval for both RX and TX echo functions.
This commit introduces separate RX and TX timers for echo packets.
The main advantage is to be able to set the receive interval to zero
when we don't want to receive echo packets from the remote system.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Echo-mode implementation is currently broken. Instead of sending packets
to it's own address, bfdd is sending echo packets to the peer's address.
It may seem to work when testing between two FRR instances, because FRR
loops back such packets, but no other implementation is supposed to do
that.
Let's warn users that the current implementation works only between two
FRR instances.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Currently this timer is only started when we receive the first echo
packet. If we never receive the packet, the timer is never started and
the user falsely assumes that echo function is working.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Current behavior is inconsistent. When the session is created by another
daemon, it is up by default. When we later configure peer in bfdd, the
session is still up, but the NB layer thinks that it is down.
More than that, even when the session is created in bfdd using peer
command, it is created in DOWN state, not ADM_DOWN. And it actually
starts sending and receiving packets. The sessions is marked with
SHUTDOWN flag only when we try to reconfigure some parameter. This
behavior is also very unexpected.
Fixes#7780.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
RFC 5880 Section 6.8.4:
In Asynchronous mode, the Detection Time calculated in the local
system is equal to the value of Detect Mult received from the remote
system, multiplied by the agreed transmit interval of the remote
system (the greater of bfd.RequiredMinRxInterval and the last
received Desired Min TX Interval).
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
It's not currently possible to configure echo mode in profile node:
```
(config)# bfd
(config-bfd)# profile test
(config-bfd-profile)# echo-mode
% Echo mode is only available for single hop sessions.
(config-bfd-profile)# echo-interval 20
% Echo mode is only available for single hop sessions.
```
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Currently it is possible to configure the same peer with and without
interface name:
```
bfd
peer 1.1.1.1
!
peer 1.1.1.1 interface enp0s3
!
```
There are multiple problems with that:
1. Both nodes actually control the same BFD session. So the config is
either duplicated or, even worse, different - and there is no way to
say which one actually works.
2. When the user deletes both nodes, the session is not actually freed,
because its refcount is always greater than 1.
Such configuration must be forbidden. User should either have single
node with wildcard name or multiple nodes with actual names.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
If local-address is not supplied, then an incorrect xpath is generated
which is not expected by NB CLI.
Fixes#7465.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Valgrind is still reporting:
466020-==466020== by 0x11B9F4: main (bfdd.c:403)
466020-==466020== Address 0x5a7d544 is 84 bytes inside a block of size 272 free'd
466020:==466020== at 0x48399AB: free (vg_replace_malloc.c:538)
466020-==466020== by 0x490A947: qfree (memory.c:140)
466020-==466020== by 0x48F2AE8: if_delete (if.c:322)
466020-==466020== by 0x48F250D: if_destroy_via_zapi (if.c:195)
466020-==466020== by 0x497071E: zclient_interface_delete (zclient.c:2040)
466020-==466020== by 0x49745F6: zclient_read (zclient.c:3687)
466020-==466020== by 0x4955AEC: thread_call (thread.c:1684)
466020-==466020== by 0x48FF64E: frr_run (libfrr.c:1126)
466020-==466020== by 0x11B9F4: main (bfdd.c:403)
466020-==466020== Block was alloc'd at
466020:==466020== at 0x483AB65: calloc (vg_replace_malloc.c:760)
466020-==466020== by 0x490A805: qcalloc (memory.c:115)
466020-==466020== by 0x48F23D6: if_new (if.c:160)
466020-==466020== by 0x48F257F: if_create_name (if.c:214)
466020-==466020== by 0x48F3493: if_get_by_name (if.c:558)
466020-==466020== by 0x49705F2: zclient_interface_add (zclient.c:1989)
466020-==466020== by 0x49745E0: zclient_read (zclient.c:3684)
466020-==466020== by 0x4955AEC: thread_call (thread.c:1684)
466020-==466020== by 0x48FF64E: frr_run (libfrr.c:1126)
466020-==466020== by 0x11B9F4: main (bfdd.c:403)
Apparently the bs->ifp pointer is being set even in cases when
the bs->key.ifname is not being set. So go through and just
match the interface pointer and cut-to-the-chase.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
BFD key has optional fields "local" and "ifname" which can be empty when
the BFD session is created. In this case, the hash key will be calculated
with these fields filled with zeroes.
Later, when we're looking for the BFD session using the key with fields
"local" and "ifname" populated with actual values, the hash key will be
different. To work around this issue, we're doing multiple hash lookups,
first with full key, then with fields "local" and "ifname" filled with
zeroes.
But there may be another case when the initial key has the actual values
for "local" and "ifname", but the key we're using for lookup has empty
values. This case is covered for IPv4 by using additional hash walk with
bfd_key_lookup_ignore_partial_walker function but is not covered for IPv6.
Instead of introducing more hacks and workarounds, the following solution
is proposed:
- the hash key is always calculated in bfd_key_hash_do using only
required fields
- the hash data is compared in bfd_key_hash_cmp, taking into account the
fact that fields "local" and "ifname" may be empty
Using this solution, it's enough to make only one hash lookup.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
On shutdown, interfaces are deleted but if the bfd session
is down we retain the interface pointer. Remove the retained
pointer.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Valgrind reports:
2052866-==2052866==
2052866-==2052866== Syscall param sendmsg(msg.msg_name) points to uninitialised byte(s)
2052866:==2052866== at 0x49C8E13: sendmsg (sendmsg.c:28)
2052866-==2052866== by 0x11DC08: bp_udp_send (bfd_packet.c:823)
2052866-==2052866== by 0x11DD76: ptm_bfd_echo_snd (bfd_packet.c:179)
2052866-==2052866== by 0x114C2D: ptm_bfd_echo_xmt_TO (bfd.c:469)
2052866-==2052866== by 0x114C2D: ptm_bfd_echo_start (bfd.c:498)
2052866-==2052866== by 0x114C2D: bs_echo_timer_handler (bfd.c:1199)
2052866-==2052866== by 0x11E478: bfd_recv_cb (bfd_packet.c:702)
2052866-==2052866== by 0x4904846: thread_call (thread.c:1681)
2052866-==2052866== by 0x48CB4DF: frr_run (libfrr.c:1126)
2052866-==2052866== by 0x113044: main (bfdd.c:403)
2052866-==2052866== Address 0x1ffefff3e8 is on thread 1's stack
In ptm_bfd_echo_snd, for the v4 case we were memsetting the v6 memory
then setting the v4 memory. Just fix it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
when receiving interface and address notifications, one may be puzzled
by the information since for example, the presence of an interface is
not enough to use it in a bfd session, simply because the interface is
in the wrong vrf. add VRF information on those traces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
on vrf-lite environment, all incoming bfd packets are received by the
same socket on the default namespace. the vrfid is not relevant and
needs to be updated based on the incoming interface where traffic has
been received. If the traffic is received from an interface belonging to
a separate vrf, update the vrfid value accordingly.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The vrf interface notification and interface notifications are separated
on zapi interface between the system (zebra daemon) and other daemons
(bfd for instance). In the case of bfd, the initial code was waiting for
vrf notification to create the socket. Actually, in vrf-lite world, we
need to wait the vrf interface to be present, in order to create the
socket and bind to the vrf interface (this is the usual way to work with
vrf-lite).
On bfd, the changes consist in delaying the socket creation first, then
when interface is created, check the interface name presence instead of
checking the interface configuration.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When running in vrf-lite mode, the socket used in a vrf environment
should be bound to an interface belonging to the vrf. If no one is
selected, then the vrf interface itself should be bound to that socket,
so that outgoing packets are being applied routing rules for that vrf.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Replace the unclear error message:
```
% Failed to edit configuration.
YANG error(s):
Schema node not found.
YANG path: /frr-bfdd:bfdd/bfd/sessions/single-hop[dest-addr='192.168.253.6'][interface=''][vrf='default']/minimum-ttl
```
With:
```
frr(config-bfd-peer)# minimum-ttl 250
% Minimum TTL is only available for multi hop sessions.
! or
frr(config-bfd-peer)# echo
% Echo mode is only available for single hop sessions.
frr(config-bfd-peer)# echo-interval 300
% Echo mode is only available for single hop sessions.
```
Reported-by: Trae Santiago
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Don't reset interface/vrf pointer everytime a session is disabled
instead only do it when it was explicitly removed.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Show BFD sessions updated counters by asking the data plane for this
information and show data plane statistics.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add hooks in the correct places so the BFD daemon uses the data plane
instead of the software packet sending implementation to monitor the
session.
This code also adds some handlers to support fallback to FRR BFD session
handling, however since this complicates the code it won't work at the
moment (the BFD sockets are disabled by default when using data plane).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The current distributed BFD implementantion doesn't support falling back
to software implementation in FRR, so to keep the code simple lets give
the data plane full control of the BFD packet handling (helps running a
software data plane for testing too otherwise it would fail with 'address
in use' error).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Distributed BFD is a term used for BFD implementations that do not run
on the routing engine, instead it is run on a data plane (software or
hardware based).
The current code implements the basic communication between FRR BFD
daemon with an external BFD data plane and defines the protocol format
in the file `bfddp_packet.h`.
To enable/use data plane you need to start BFD daemon with the command
line `--dplaneaddr <type>:<address>`, then a socket will be opened to
listen for incoming data plane connections.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The BFD data plane header has definitions for the data plane
communication protocol that will be used to implement the distributed
BFD feature.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Allows users with multiple links using same IPv6 address (same VRF) to
work.
Reported-by: Matti Suuronen
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The interface address delete callback from zebra was not
deleting the ifc that was created as per normal work methodologies
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Let the integration protocol always send the full configuration
instead of saving a few bytes. It will also allow protocols to specify
source address for IPv4 single hop connections and interface for multi
hop configuration.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Change thread_cancel to take a ** to an event, NULL-check
before dereferencing, and NULL the caller's pointer. Update
many callers to use the new signature.
Signed-off-by: Mark Stapp <mjs@voltanet.io>