When `terminal monitor` is issued I am seeing this for valgrind on freebsd:
2022/03/24 18:07:45 ZEBRA: [RHJDG-5FNSK][EC 100663304] can't open configuration file [/usr/local/etc/frr/zebra.conf]
==52993== Syscall param sendmsg(sendmsg.msg_control) points to uninitialised byte(s)
==52993== at 0x4CE268A: _sendmsg (in /lib/libc.so.7)
==52993== by 0x4B96245: ??? (in /lib/libthr.so.3)
==52993== by 0x4CDF329: sendmsg (in /lib/libc.so.7)
==52993== by 0x49A9994: vtysh_do_pass_fd (vty.c:2041)
==52993== by 0x49A9994: vtysh_flush (vty.c:2070)
==52993== by 0x499F4CE: thread_call (thread.c:2002)
==52993== by 0x495D317: frr_run (libfrr.c:1196)
==52993== by 0x2B4068: main (main.c:471)
==52993== Address 0x7fc000864 is on thread 1's stack
==52993== in frame #3, created by vtysh_flush (vty.c:2065)
Fix by initializing the memory to `0`
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This adds the plumbing necessary to yield back a file descriptor to
vtysh. The fd is passed on the command status code bytes through
AF_UNIX SCM_RIGHTS.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
... this is copypasted all over the codebase & should've been a helper
to begin with really.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
These are just used to iterate over active vty sessions, a vector is a
weird choice there.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
These had no remaining users for a while now. The logging backend has
its own list of receivers.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Do not return pointer to the newly created thread from various thread_add
functions. This should prevent developers from storing a thread pointer
into some variable without letting the lib know that the pointer is
stored. When the lib doesn't know that the pointer is stored, it doesn't
prevent rescheduling and it can lead to hard to find bugs. If someone
wants to store the pointer, they should pass a double pointer as the last
argument.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
There is a possibility that the same line can be matched as a command in
some node and its parent node. In this case, when reading the config,
this line is always executed as a command of the child node.
For example, with the following config:
```
router ospf
network 193.168.0.0/16 area 0
!
mpls ldp
discovery hello interval 111
!
```
Line `mpls ldp` is processed as command `mpls ldp-sync` inside the
`router ospf` node. This leads to a complete loss of `mpls ldp` node
configuration.
To eliminate this issue and all possible similar issues, let's print an
explicit "exit" at the end of every node config.
This commit also changes indentation for a couple of existing exit
commands so that all existing commands are on the same level as their
corresponding node-entering commands.
Fixes#9206.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
...really no reason to force this into a compile time decision. The
only point is avoiding the getrusage() syscall, which can easily be a
runtime decision.
[v2: also split cputime & walltime limits]
Signed-off-by: David Lamparter <equinox@diac24.net>
This commit fixes the following problem:
- enter the interface node
- move the interface to another VRF
- try to continue configuring the interface
It is not possible to continue configuration because the XPath stored in
the vty doesn't correspond with the actual state of the system anymore.
For example:
```
nfware# conf
nfware(config)# interface enp2s0
<-- move the enp2s0 to a different VRF -->
nfware(config-if)# ip router isis 1
% Failed to get iface dnode in candidate DB
```
To fix the issue, go through all connected vty shells and update the
stored XPath.
Suggested-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The backoff code assumed that yang operations always completed quickly.
It checked for > 100 YANG modeled commands happening in under 1 second
to enable batching. If 100 yang modeled commands always take longer than
1 second batching is never enabled. This is the exact opposite of what
we want to happen since batching speeds the operations up.
Here are the results for libyang2 code without and with batching.
| action | 1K rts | 2K rts | 1K rts | 2K rts | 20k rts |
| | nobatch | nobatch | batch | batch | batch |
| Add IPv4 | .881 | 1.28 | .703 | 1.04 | 8.16 |
| Add Same IPv4 | 28.7 | 113 | .590 | .860 | 6.09 |
| Rem 1/2 IPv4 | .376 | .442 | .379 | .435 | 1.44 |
| Add Same IPv4 | 28.7 | 113 | .576 | .841 | 6.02 |
| Rem All IPv4 | 17.4 | 71.8 | .559 | .813 | 5.57 |
(IPv6 numbers are basically the same as iPv4, a couple percent slower)
Clearly we need this. Please note the growth (1K to 2K) w/o batching is
non-linear and 100 times slower than batched.
Notes on code: The use of the new `nb_cli_apply_changes_clear_pending`
is to commit any pending changes (including the current one). This is
done when the code would not correctly handle a single diff that
included the current changes with possible following changes. For
example, a "no" command followed by a new value to replace it would be
merged into a change, and the code would not deal well with that. A good
example of this is BGP neighbor peer-group changing. The other use is
after entering a router level (e.g., "router bgp") where the follow-on
command handlers expect that router object to now exists. The code
eventually needs to be cleaned up to not fail in these cases, but that
is for future NB cleanup.
Signed-off-by: Christian Hopps <chopps@labn.net>
Incorporate into the `show thread cpu` the number of times
we have issued warnings about a particular thread being
too slow.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When generating SLOW_THREAD warnings let's differentiate between
a cpu bound process and a wall bound process. Effectively
a slow thread can now be a process in FRR doing lots of work( cpu bound )
or wall bound ( the cpu is heavy load and a FRR process may be pre-empted
and never scheduled ).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Back when I put this together in 2015, ISO C11 was still reasonably new
and we couldn't require it just yet. Without ISO C11, there is no
"good" way (only bad hacks) to require a semicolon after a macro that
ends with a function definition. And if you added one anyway, you'd get
"spurious semicolon" warnings on some compilers...
With C11, `_Static_assert()` at the end of a macro will make it so that
the semicolon is properly required, consumed, and not warned about.
Consistently requiring semicolons after "file-level" macros matches
Linux kernel coding style and helps some editors against mis-syntax'ing
these macros.
Signed-off-by: David Lamparter <equinox@diac24.net>
CR, NL and CRNL are all OK, but CRNL shouldn't get treated as 2
newlines (which causes an empty command to be executed => empty prompt
line.)
Signed-off-by: David Lamparter <equinox@diac24.net>
The FDs are in struct vty, and there's ->fd and ->wfd, which shouldn't
be confused. Passing vty_sock along separately just creates mixups.
Signed-off-by: David Lamparter <equinox@diac24.net>
The logging code writes log messages with a `\n` line ending, meanwhile
the VTY code switches it so you need `\r\n`...
And we don't flush the newline after executing a command either.
After this patch, starting daemons like `zebra/zebra -t` should provide
a nice development/debugging experience with a VTY open right there on
stdio and `log stdout` interspersed.
(This is already documented in the man pages, it just looked like sh*t
previously since the log messages didn't newline correctly.)
Signed-off-by: David Lamparter <equinox@diac24.net>
The return from sockunion2hostprefix tells us if the conversion
succeeded or not. There are places in the code where we
always assume that it just `works`, since it can fail
notice and try to do the right thing.
Please note that failure of this function for most cases
of sockunion2hostprefix is highly highly unlikely as that
the sockunion was already created and tested elsewhere
it's just that this function can fail.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The Solaris code has gone through a deprecation cycle. No-one
has said anything to us and worse of all we don't have any test
systems running Solaris to know if we are making changes that
are breaking on Solaris. Remove it from the system so
we can clean up a bit.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
* use actual error code instead of "false"
* add missing new line
Before:
```
nfware# show interface | include (a]
% Regex compilation error: Success% Bad regexp '(a]'
% Unknown command: show interface | include (a]
```
After:
```
nfware# show interface | include (a]
% Regex compilation error: Unmatched ( or \(
% Bad regexp '(a]'
% Unknown command: show interface | include (a]
```
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When using the default CLI mode, the northbound layer needs to create
a separate transaction to process each YANG-modeled command since
they are supposed to be applied immediately (there's no candidate
configuration nor the "commit" command like in the transactional
CLI). The problem is that configuration transactions have an overhead
associated to them, in big part because of the use of some heavy
libyang functions like `lyd_validate()` and `lyd_diff()`. As of
now this overhead is substantial and doesn't scale well when large
numbers of transactions need to be performed in sequence.
As an example, loading 50k prefix-lists using a single transaction
takes about 2 seconds on a modern CPU. Loading the same 50k
prefix-lists using 50k transactions can take more than an hour
to complete (which is unacceptable by any standard). To fix this
problem, some heavy optimization work needs to be done on libyang and
on the FRR northbound itself too (e.g. perform partial configuration
diffs whenever possible). This, however, should be a long term
effort since these optimizations shouldn't be trivial to implement
and we're far from having the performance numbers we need.
In the meanwhile, this commit introduces a simple but efficient
workaround to alleviate the issue. In short, a new back-off timer
was introduced in the CLI to monitor and detect when too many
YANG-modeled commands are being received at the same time. When
a certain threshold is reached (100 YANG-modeled commands within
one second), the northbound starts to group all subsequent commands
into a single large transaction, which allows them to be processed
much faster (e.g. seconds and not hours). It's essentially a
protection mechanism that creates dynamically-sized transactions
when necessary to prevent performance issues from happening. This
mechanism is enabled both when parsing configuration files and when
reading commands from a terminal.
The downside of this optimization is that, if several YANG-modeled
commands are grouped into the same transaction and at least one of
them fails, the whole transaction is rejected. This is undesirable
since users don't expect transactional behavior when that's not
enabled explicitly. To minimize this issue, the CLI will log all
commands that were rejected whenever that happens, to make the
user aware of what happened and have enough information to fix
the problem. Commands that fail due to parsing errors or CLI-level
validations in general are rejected separately.
Again, this proposed workaround is intended to be temporary. The
goal is to provided a quick fix to issues like #6658 while we work
on better long-term solutions.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
Instead of returning only error codes (e.g. NB_ERR_VALIDATION)
to the northbound clients, do better than that and also return
a human-readable error message. This should make FRR more
automation-friendly since operators won't need to dig into system
logs to find out what went wrong in the case of an error.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The new northbound context structure contains information about
the client performing a configuration transaction. This information
will be made available to all configuration callbacks through the
args->context parameter.
The usefulness of this structure comes from the fact that it can be
used as a communication channel (both input and output) between the
northbound callbacks and the northbound clients. This can be done
through its "client_data" field which contains client-specific data.
This should cover some very specific scenarios where a northbound
callback should perform an action only if the configuration change
is coming from a given client. An example would be sending a PCEP
response to a PCE when an SR-TE policy is created or modified
through the PCEP northbound client (for that to happen, the
northbound callbacks need to have access to the PCEP request ID,
which needs to be available).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Rather than doing a f*gly hack for the RPKI code, let's do an on-exit
hook in cmd_node. Also allows replacing some special-casing in the vty
code.
Signed-off-by: David Lamparter <equinox@diac24.net>
And again for the name. Why on earth would we centralize this, just so
people can forget to update it?
Signed-off-by: David Lamparter <equinox@diac24.net>
Same as before, instead of shoving this into a big central list we can
just put the parent node in cmd_node.
Signed-off-by: David Lamparter <equinox@diac24.net>
There is really no reason to not put this in the cmd_node.
And while we're add it, rename from pointless ".func" to ".config_write".
[v2: fix forgotten ldpd config_write]
Signed-off-by: David Lamparter <equinox@diac24.net>
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>