Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This commit contains fixes for the following issues found
- 'mgmt commit check' issued through 'vtysh -f' was actually commtting the changeset.
- On config validation failure backend, mgmtd was not passing the correct error-reason
to frontend.
- 'mgmt rollback ...' was reverting the change on backend, but config on mgmtd daemon
remains intact
Signed-off-by: Pushpasis Sarkar <pushpasis@gmail.com>
This commit introduces the MGMT Transaction framework that takes
management requests from one (or more) frontend client sessions,
translates them into transactions and drives them to completion
in co-oridination with one (or more) backend client daemons
involved in the request.
This commit includes the following functionalities in the changeset:
1. Introduces the actual Transaction module. Commands added related to
transaction are:
a. show mgmt transaction all
2. Adds support for commit rollback feature which stores upto the 10
commit buffers. Each commit has a commit-id which can be used to
rollback to the exact configuration state.
Commands supported for this feature are:
a. show mgmt commit-history
b. mgmt rollback commit-id COMMIT_ID
3. Add hidden commands to enable record various performance metrics:
a. mgmt performance-measurement
b. mgmt reset-statistic
Co-authored-by: Pushpasis Sarkar <pushpasis@gmail.com>
Co-authored-by: Abhinay Ramesh <rabhinay@vmware.com>
Co-authored-by: Ujwal P <ujwalp@vmware.com>
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
This commit introduces the MGMT Backend Interface which can be used
by back-end management client daemons like BGPd, Staticd, Zebra to
connect with new FRR Management daemon (MGMTd) and utilize the new
FRR Management Framework to let any Frontend clients to retrieve any
operational data or manipulate any configuration data owned by the
individual Backend daemon component.
This commit includes the following functionalities in the changeset:
1. Add new Backend server for Backend daemons connect to.
2. Add a C-based Backend client library which can be used by daemons
to communicate with MGMTd via the Backend interface.
3. Maintain a backend adapter for each connection from an appropriate
Backend client to facilitate client requests and track one or more
transactions initiated from Frontend client sessions that involves
the backend client component.
4. Add the following commands to inspect various Backend client
related information
a. show mgmt backend-adapter all
b. show mgmt backend-yang-xpath-registry
c. show mgmt yang-xpath-subscription
Co-authored-by: Pushpasis Sarkar <pushpasis@gmail.com>
Co-authored-by: Abhinay Ramesh <rabhinay@vmware.com>
Co-authored-by: Ujwal P <ujwalp@vmware.com>
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
This commit introduces the Frontend Interface which can be used
by front-end management clients like Netconf server, Restconf
Server and CLI to interact with new FRR Management daemon (MGMTd)
to access and sometimes modify FRR management data.
This commit includes the following functionalities in the changeset:
1. Add new Frontend server for clients connect to.
2. Add a C-based Frontend client library which can be used by Frontend
clients to communicate with MGMTd via the Frontend interface.
3. Maintain a frontend adapter for each connection from an appropriate
Frontend client to facilitate client requests and track one or more
client sessions across it.
4. Define the protobuf message format for messages to be exchanged
between MGMTd Frontend module and the Frontend client.
5. This changeset also introduces an instance of MGMT Frontend client
embedded within the lib/vty module that can be leveraged by any FRR
daemon to connect to MGMTd's Frontend interface. The same has been
integrated with and initialized within the MGMTd daemon's process
context to implement a bunch of 'set-config', 'commit-apply',
'get-config' and 'get-data' commands via VTYSH
Co-authored-by: Pushpasis Sarkar <pushpasis@gmail.com>
Co-authored-by: Abhinay Ramesh <rabhinay@vmware.com>
Co-authored-by: Ujwal P <ujwalp@vmware.com>
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
Pass context argument by value on initialization to be clear that the
value is used/saved but not a pointer to the value. Previously the
northbound code was incorrectly holding a pointer to stack allocated
context structs.
However, the structure definition also had some musings (ifdef'd out
code) and a comment that might be taken to imply that user data could
follow the structure and thus be maintained by the code; it won't; so it
can't; so get rid of the disabled misleading code/text from the
structure definition.
The common use case worked b/c the transaction which cached the pointer
was created and freed inside a single function
call (`nb_condidate_commit`) that executed below the stack allocation.
All other use cases (grpc, confd, sysrepo, and -- coming soon -- mgmtd)
were bugs.
Signed-off-by: Christian Hopps <chopps@labn.net>
Rather than running selected source files through the preprocessor and a
bunch of perl regex'ing to get the list of all DEFUNs, use the data
collected in frr.xref.
This not only eliminates issues we've been having with preprocessor
failures due to nonexistent header files, but is also much faster.
Where extract.pl would take 5s, this now finishes in 0.2s. And since
this is a non-parallelizable build step towards the end of the build
(dependent on a lot of other things being done already), the speedup is
actually noticeable.
Also files containing CLI no longer need to be listed in `vtysh_scan`
since the .xref data covers everything. `#ifndef VTYSH_EXTRACT_PL`
checks are equally obsolete.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
To ensure this, add a const modifier to functions' arguments. Would be
great do this initially and avoid this large code change, but better
late than never.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Instead of sorting each command one-by-one using listnode_add_sort, add
them to the list without sorting and then sort the list only once.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The backoff code assumed that yang operations always completed quickly.
It checked for > 100 YANG modeled commands happening in under 1 second
to enable batching. If 100 yang modeled commands always take longer than
1 second batching is never enabled. This is the exact opposite of what
we want to happen since batching speeds the operations up.
Here are the results for libyang2 code without and with batching.
| action | 1K rts | 2K rts | 1K rts | 2K rts | 20k rts |
| | nobatch | nobatch | batch | batch | batch |
| Add IPv4 | .881 | 1.28 | .703 | 1.04 | 8.16 |
| Add Same IPv4 | 28.7 | 113 | .590 | .860 | 6.09 |
| Rem 1/2 IPv4 | .376 | .442 | .379 | .435 | 1.44 |
| Add Same IPv4 | 28.7 | 113 | .576 | .841 | 6.02 |
| Rem All IPv4 | 17.4 | 71.8 | .559 | .813 | 5.57 |
(IPv6 numbers are basically the same as iPv4, a couple percent slower)
Clearly we need this. Please note the growth (1K to 2K) w/o batching is
non-linear and 100 times slower than batched.
Notes on code: The use of the new `nb_cli_apply_changes_clear_pending`
is to commit any pending changes (including the current one). This is
done when the code would not correctly handle a single diff that
included the current changes with possible following changes. For
example, a "no" command followed by a new value to replace it would be
merged into a change, and the code would not deal well with that. A good
example of this is BGP neighbor peer-group changing. The other use is
after entering a router level (e.g., "router bgp") where the follow-on
command handlers expect that router object to now exists. The code
eventually needs to be cleaned up to not fail in these cases, but that
is for future NB cleanup.
Signed-off-by: Christian Hopps <chopps@labn.net>
Compile with v2.0.0 tag of `libyang2` branch of:
https://github.com/CESNET/libyang
staticd init load time of 10k routes now 6s vs ly1 time of 150s
Signed-off-by: Christian Hopps <chopps@labn.net>
... by referencing all autogenerated headers relative to the root
directory. (90% of the changes here is `version.h`.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This command doesn't rely on transactional CLI and works perfectly for
daemons converted to northbound configuration.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Replace all lib/thread cancel macros, use thread_cancel()
everywhere. Only the THREAD_OFF macro and thread_cancel() api are
supported. Also adjust thread_cancel_async() to NULL caller's pointer (if
present).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Display human readable error message in northbound rpc
transaction failure. In case of vtysh nb client, the error
message will be displayed to user.
Testing:
bharat# clear evpn dup-addr vni 1002 ip 11.11.11.11
Error type: generic error
Error description: Requested IP's associated MAC aa:aa:aa:aa:aa:aa is still
in duplicate state
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Whenever libyang loads a module that contains a leafref, it will
also implicitly load the module of the referring node if it's
not loaded already. That makes sense as otherwise it wouldn't be
possible to validate the leafref value correctly.
The problem is that loading a module implicitly violates the
assumption of the northbound layer that all loaded modules
are implemented (i.e. they have a northbound node associated
to each schema node). This means that loading a module that
isn't implemented can lead to crashes as the "priv" pointer
of schema nodes is no longer guaranteed to be valid. To fix this
problem, add a few null checks to ignore data nodes associated
to non-implemented modules.
The side effect of this change is harmless. If a daemon receives
configuration it doesn't support (e.g. BFD peers on staticd),
that configuration will be stored but otherwise ignored. This can
only happen when using a northbound client like gRPC, as the CLI
will never send to a daemon a command it doesn't support. This
minor problem should go away in the long run as FRR migrates to
a centralized management model, at which point the YANG-modeled
configuration of all daemons will be maintained in a single place.
Finally, update some daemons to stop implementing YANG modules
they don't need to (i.e. revert 1b741a01c and a74b47f5).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When not using the transactional CLI mode, do not display a
warning when a YANG-modeled commmand doesn't perform any effective
configuration change.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
While a configuration transaction can't be rejected once it reaches
the APPLY phase, we should allow NB callbacks to generate error
or warning messages when a configuration change is being applied.
That should be useful, for example, to return warnings back to
the user informing that the applied configuration has some kind of
inconsistency or is missing something in order to be effectively
activated. The infrastructure for this was already present, but the
northbound layer was ignoring all errors/warnings generated during
the apply/abort phases instead of returning them to the user. This
commit changes that.
In the gRPC plugin, extend the Commit() RPC adding a new
"error_message" field to the response type. This is necessary to
allow errors/warnings to be returned even when the commit operation
succeeds (since grpc::Status::OK doesn't support error messages
like the other status codes).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When using the default CLI mode, the northbound layer needs to create
a separate transaction to process each YANG-modeled command since
they are supposed to be applied immediately (there's no candidate
configuration nor the "commit" command like in the transactional
CLI). The problem is that configuration transactions have an overhead
associated to them, in big part because of the use of some heavy
libyang functions like `lyd_validate()` and `lyd_diff()`. As of
now this overhead is substantial and doesn't scale well when large
numbers of transactions need to be performed in sequence.
As an example, loading 50k prefix-lists using a single transaction
takes about 2 seconds on a modern CPU. Loading the same 50k
prefix-lists using 50k transactions can take more than an hour
to complete (which is unacceptable by any standard). To fix this
problem, some heavy optimization work needs to be done on libyang and
on the FRR northbound itself too (e.g. perform partial configuration
diffs whenever possible). This, however, should be a long term
effort since these optimizations shouldn't be trivial to implement
and we're far from having the performance numbers we need.
In the meanwhile, this commit introduces a simple but efficient
workaround to alleviate the issue. In short, a new back-off timer
was introduced in the CLI to monitor and detect when too many
YANG-modeled commands are being received at the same time. When
a certain threshold is reached (100 YANG-modeled commands within
one second), the northbound starts to group all subsequent commands
into a single large transaction, which allows them to be processed
much faster (e.g. seconds and not hours). It's essentially a
protection mechanism that creates dynamically-sized transactions
when necessary to prevent performance issues from happening. This
mechanism is enabled both when parsing configuration files and when
reading commands from a terminal.
The downside of this optimization is that, if several YANG-modeled
commands are grouped into the same transaction and at least one of
them fails, the whole transaction is rejected. This is undesirable
since users don't expect transactional behavior when that's not
enabled explicitly. To minimize this issue, the CLI will log all
commands that were rejected whenever that happens, to make the
user aware of what happened and have enough information to fix
the problem. Commands that fail due to parsing errors or CLI-level
validations in general are rejected separately.
Again, this proposed workaround is intended to be temporary. The
goal is to provided a quick fix to issues like #6658 while we work
on better long-term solutions.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Instead of returning only error codes (e.g. NB_ERR_VALIDATION)
to the northbound clients, do better than that and also return
a human-readable error message. This should make FRR more
automation-friendly since operators won't need to dig into system
logs to find out what went wrong in the case of an error.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The new northbound context structure contains information about
the client performing a configuration transaction. This information
will be made available to all configuration callbacks through the
args->context parameter.
The usefulness of this structure comes from the fact that it can be
used as a communication channel (both input and output) between the
northbound callbacks and the northbound clients. This can be done
through its "client_data" field which contains client-specific data.
This should cover some very specific scenarios where a northbound
callback should perform an action only if the configuration change
is coming from a given client. An example would be sending a PCEP
response to a PCE when an SR-TE policy is created or modified
through the PCEP northbound client (for that to happen, the
northbound callbacks need to have access to the PCEP request ID,
which needs to be available).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
And again for the name. Why on earth would we centralize this, just so
people can forget to update it?
Signed-off-by: David Lamparter <equinox@diac24.net>
There is really no reason to not put this in the cmd_node.
And while we're add it, rename from pointless ".func" to ".config_write".
[v2: fix forgotten ldpd config_write]
Signed-off-by: David Lamparter <equinox@diac24.net>
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>
Since we've been writing out "frr version" and "frr defaults" for about
a year and a half now, we can now actually use them to manage defaults.
Signed-off-by: David Lamparter <equinox@diac24.net>
Commit 5e6a9350c1 implemented an optimization where candidate
configurations are validated only before being displayed. The
validation is done only to create default child nodes (due to
how libyang works) and any possible error is ignored (candidate
configurations can be invalid/incomplete).
The problem is that we were calling lyd_validate() only when the
CLI "with-defaults" option was used. But some cli_show() callbacks
assume that default nodes exist and can crash when displaying a
candidate configuration that isn't validated. To fix this, call
lyd_validate() before displaying candidate configuration even when
"with-defaults" is not used (that was a micro-optimization that
shouldn't have been done).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Guard the libyang debug messages under this command so that only
people interested on those messages will see them.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The dnode member of the nb_config structure can be null on
daemons that don't implement any YANG module. As such, update
the nb_cli_show_config_prepare() function to always check if the
libyang data node that is going to be displayed is null or not
before operating on it.
This fixes the following warning (introduced by commit 5e6a9350c1):
libyang: Invalid arguments (lyd_schema_sort())
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
nb_candidate_edit() was calling both the lyd_schema_sort() and
lyd_validate() functions whenever a new node was added to the
candidate configuration. This was done to ensure the candidate
is always ready to be displayed correctly (libyang only creates
default child nodes during the validation process, and data nodes
aren't guaranteed to be ordered by default).
The problem is that the two aforementioned functions are too
expensive to be called in the northbound hot path. Instead, it makes
more sense to call them only before displaying the configuration
(in which case a recursive sort needs to be done). Introduce the
nb_cli_show_config_prepare() to achieve that purpose.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The nb_cli_apply_changes() function was creating a full copy of
the candidate configuration before editing it. This excerpt from
the northbond documentation explains why this was being done:
"NOTE: the nb_cli_cfg_change() function clones the candidate
configuration before actually editing it. This way, if any error
happens during the editing, the original candidate is restored to
avoid inconsistencies. Either all changes from the configuration
command are performed successfully or none are. It's like a
mini-transaction but happening on the candidate configuration
(thus the northbound callbacks are not involved)".
The problem is that this kind of error handling is just too
expensive. A command should never fail to edit the candidate
configuration unless there's a bug in the code (e.g. when the
CLI wrapper command passes an integer value that YANG rejects due
to a "range" statement). In such cases, a command might fail to
be applied or applied only partially if it edits multiple YANG
nodes. When that happens, just log an error to make the operator
aware of the problem, but otherwise ignore it instead of rejecting
the command and restoring the candidate to its previous state. We
shouldn't add an extreme overhead to the northbound CLI client only
to handle errors that should never happen in practice.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Adding a lock to protect the global running configuration doesn't
help much since the FRR daemons are not prepared to process
configuration changes in a pthread that is not the main one (a
whole lot of new protections would be necessary to prevent race
conditions).
This means the lock added by commit 83981138 only adds more
complexity for no benefit. Remove it now to simplify the code.
All northbound clients, including the gRPC one, should either run
in the main pthread or use synchronization primitives to process
configuration transactions in the main pthread.
This reverts commit 83981138fe.