Commit Graph

4887 Commits

Author SHA1 Message Date
David Lamparter
d51683e6fe
Merge pull request #10868 from donaldsharp/zlog_backtrace_uninited 2022-03-25 06:25:39 +01:00
David Lamparter
4a090881c8
Merge pull request #10852 from mjstapp/fix_lib_distclean 2022-03-25 06:03:10 +01:00
Donald Sharp
8486b790fe lib: Prevent uninitialized bytes
When using zlog_backtrace I am seeing this:

==66286== Syscall param write(buf) points to uninitialised byte(s)
==66286==    at 0x4CDF48A: syscall (in /lib/libc.so.7)
==66286==    by 0x4A0D409: ??? (in /usr/local/lib/libunwind.so.8.0.1)
==66286==    by 0x4A0D694: ??? (in /usr/local/lib/libunwind.so.8.0.1)
==66286==    by 0x4A0E2F4: _ULx86_64_step (in /usr/local/lib/libunwind.so.8.0.1)
==66286==    by 0x49662DB: zlog_backtrace (log.c:250)
==66286==    by 0x2AFFA6: if_get_mtu (ioctl.c:163)
==66286==    by 0x2B2D9D: ifan_read (kernel_socket.c:457)
==66286==    by 0x2B2D9D: kernel_read (kernel_socket.c:1406)
==66286==    by 0x499F46E: thread_call (thread.c:2002)
==66286==    by 0x495D2B7: frr_run (libfrr.c:1196)
==66286==    by 0x2B4098: main (main.c:471)
==66286==  Address 0x7fc000000 is on thread 1's stack
==66286==  in frame #4, created by zlog_backtrace (log.c:239)
==66286==

Let's initialize some data

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-24 20:37:44 -04:00
Donald Sharp
b4c94f8c4d lib: Fix terminal monitor uninited memory usage on freebsd
When `terminal monitor` is issued I am seeing this for valgrind on freebsd:

2022/03/24 18:07:45 ZEBRA: [RHJDG-5FNSK][EC 100663304] can't open configuration file [/usr/local/etc/frr/zebra.conf]
==52993== Syscall param sendmsg(sendmsg.msg_control) points to uninitialised byte(s)
==52993==    at 0x4CE268A: _sendmsg (in /lib/libc.so.7)
==52993==    by 0x4B96245: ??? (in /lib/libthr.so.3)
==52993==    by 0x4CDF329: sendmsg (in /lib/libc.so.7)
==52993==    by 0x49A9994: vtysh_do_pass_fd (vty.c:2041)
==52993==    by 0x49A9994: vtysh_flush (vty.c:2070)
==52993==    by 0x499F4CE: thread_call (thread.c:2002)
==52993==    by 0x495D317: frr_run (libfrr.c:1196)
==52993==    by 0x2B4068: main (main.c:471)
==52993==  Address 0x7fc000864 is on thread 1's stack
==52993==  in frame #3, created by vtysh_flush (vty.c:2065)

Fix by initializing the memory to `0`

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-24 18:08:29 -04:00
Mark Stapp
f0368a3e43 build: remove generated lib files during distclean
Remove a couple of lex/yacc output files in lib/
during 'distclean'.

Signed-off-by: Mark Stapp <mstapp@nvidia.com>
2022-03-23 09:03:14 -04:00
Anuradha Karuppiah
7b0db0e43f lib, bgpd: changes for EAD-per-ES fragmentation
The EAD-per-ES route carries ECs for all the ES-EVI RTs. As the number of VNIs
increase all RTs do not fit into a standard BGP UPDATE (4K) so the route needs
to be fragmented.

Each fragment is associated with a separate RD and frag-id -
1. Local ES-per-EAD -
ES route table - {ES-frag-ID, ESI, ET=0xffffffff, VTEP-IP}
global route table - {RD-=ES-frag-RD, ESI, ET=0xffffffff}
2. Remote ES-per-EAD -
VNI route table - {ESI, ET=0xffffffff, VTEP-IP}
global route table - {RD-=ES-frag-RD, ESI, ET=0xffffffff}

Note: The fragment ID is abandoned in the per-VNI routing table. At this
point that is acceptable as we dont expect more than one-ES-per-EAD fragment
to be imported into the per-VNI routing table. But that may need to be
re-worked at a later point.

CLI changes (sample with 4 VNIs per-fragment for experimental pruposes) -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@torm-11:mgmt:~# vtysh -c "show bgp l2vpn evpn es 03:44:38:39:ff:ff:01:00:00:01"
ESI: 03:44:38:39:ff:ff:01:00:00:01
 Type: LR
 RD: 27.0.0.21:3
 Originator-IP: 27.0.0.21
 Local ES DF preference: 50000
 VNI Count: 10
 Remote VNI Count: 10
 VRF Count: 3
 MACIP EVI Path Count: 33
 MACIP Global Path Count: 198
 Inconsistent VNI VTEP Count: 0
 Inconsistencies: -
 Fragments: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
  27.0.0.21:3 EVIs: 4
  27.0.0.21:13 EVIs: 4
  27.0.0.21:22 EVIs: 2
 VTEPs:
  27.0.0.22 flags: EA df_alg: preference df_pref: 32767
  27.0.0.23 flags: EA df_alg: preference df_pref: 32767

root@torm-11:mgmt:~# vtysh -c "show bgp l2vpn evpn es-evi vni 1002 detail"
VNI: 1002 ESI: 03:44:38:39:ff:ff:01:00:00:01
 Type: LR
 ES fragment RD: 27.0.0.21:13 >>>>>>>>>>>>>>>>>>>>>>>>>
 Inconsistencies: -
 VTEPs: 27.0.0.22(EV),27.0.0.23(EV)

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

PS: The number of EVIs per-fragment has been set to 128 and may need further
tuning.

Ticket: #2632967

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
2022-03-18 07:37:06 -04:00
ron
5e7d0337f6 lib: wheel's typo fix
The wheel data structure is a array of list pointers
but the alloc for it is using the sizeof (struct listnode *)
as the amount to allocate.  Even though the (struct listnode *)
and (struct list *) sizes are the same, let's list the correct
values.

Signed-off-by: ron <lyq140hf2006@163.com>
2022-03-16 15:32:50 -04:00
Donald Sharp
335b15ebe0
Merge pull request #10797 from fabioantonini/sysrepo-2.0.41-support
support to sysrepo-2.0.41
2022-03-16 14:33:53 -04:00
Christian Hopps
ad6bd53692
lib: grpc: fix covevrity warnings
One uninit warning and one missing lock warning, both were OK but
let's make the tool happy.

Signed-off-by: Christian Hopps <chopps@labn.net>
2022-03-16 11:31:50 -04:00
Donatas Abraitis
f1ea200891
Merge pull request #10763 from donaldsharp/plist_speedup
lib: Convert prefix_master->str to a RB Tree
2022-03-16 13:42:18 +02:00
Fabio Antonini
7b73f3db94 lib: support to sysrepo-2.0.41
northbound_sysrepo.c fixed to use the newer APIs from sysrepo 2.0.41

Signed-off-by: Fabio Antonini <f.antonini@tiesse.com>
2022-03-15 16:30:00 +01:00
Donald Sharp
7344eb1f62
Merge pull request #10739 from LabNConsulting/chopps/fixgrpc-reorg
grpc, lib: grpc cleanup/reorg
2022-03-15 09:10:03 -04:00
Donald Sharp
d09a7c82c9
Merge pull request #10565 from lyq140/patch-thread
lib: not thread off when schedule
2022-03-15 08:40:41 -04:00
Christian Hopps
48c93061f5 lib: grpc: rework RPC handlers improve code clarity
- split NewRpcState object into 2, a Unary and a Streaming variant, which
  then allows for the next.
- move all state machine details inside these new state objects
  - use a template arg to allow for Streaming state tracking object
    creation and deletion w/o requiring this in each specific RPC
    hander.
- Code is more rugged by design now.

Thanks to Rafael Zalamena <rzalamena@opensourcerouting.org> for the cleanup
ideas/motivation.

Signed-off-by: Christian Hopps <chopps@labn.net>
2022-03-14 15:54:25 -04:00
Donald Sharp
341e1d6e0a
Merge pull request #10738 from LabNConsulting/chopps/fixgrpc
fixes for grpc module
2022-03-14 14:57:59 -04:00
Christian Hopps
d3074a5207 lib: grpc: use candiate ID to delete rather than pointer to candiate
- also be consistent in candidate IDs being uint64_t

Signed-off-by: Christian Hopps <chopps@labn.net>
2022-03-14 11:14:12 -04:00
Rafael Zalamena
79c681952e lib: call protobuf clean up on exit
Let's clean up the valgrind output even more by calling the protobuf
shutdown function that deallocates all library used memory.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2022-03-14 11:14:12 -04:00
Donald Sharp
804013b1f9
Merge pull request #10724 from opensourcerouting/lib-rotate-logs
lib: rotate log file supplied by command line
2022-03-13 10:09:48 -04:00
David Lamparter
32addf8f7a
Merge pull request #10716 from donaldsharp/routemap_rbtree_nonuniq 2022-03-13 15:08:36 +01:00
Donald Sharp
3e553c80ba
Merge pull request #10779 from opensourcerouting/typesafe-backflip
lib: typesafe container reverse iterators
2022-03-13 09:26:26 -04:00
Donatas Abraitis
3b3b5ab350
Merge pull request #10783 from donaldsharp/bgp_zebra_nht
Bgp zebra nht
2022-03-12 21:46:13 +02:00
Donald Sharp
06e4e90132 *: When matching against a nexthop send and process what it matched against
Currently the nexthop tracking code is only sending to the requestor
what it was requested to match against.  When the nexthop tracking
code was simplified to not need an import check and a nexthop check
in b8210849b8 for bgpd.  It was not
noticed that a longer prefix could match but it would be seen
as a match because FRR was not sending up both the resolved
route prefix and the route FRR was asked to match against.

This code change causes the nexthop tracking code to pass
back up the matched requested route (so that the calling
protocol can figure out which one it is being told about )
as well as the actual prefix that was matched to.

Fixes: #10766
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-12 11:18:45 -05:00
Donald Sharp
58c05959d5 bgpd, lib, pimd: Remove sockopt_cork
sockopt_cork is a no-op function that was cleaned up
in 2017.  Since then it's still not being used.  At
this point in time there is little point in keeping a
dead function that will not be used because of vagaries
between platforms

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-12 08:21:16 -05:00
David Lamparter
643ea83be2 lib: add _last and _prev on typesafe RB/DLIST
RB-tree and double-linked-list easily support backwards iteration, and
an use case seems to have popped up.  Let's make it accessible.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-12 13:23:36 +01:00
Donald Sharp
e92508a741 lib: Convert prefix_master->str to a RB Tree
The prefix_master->str data structure was a sorted
list of the prefix names.  Not that big of a deal
other than insertion and deletion is insanely expensive
when you have a large number of unique prefix-lists.

In my test config file that I discovered this,
I have 587 unique prefix lists spread out acros
~26k lines of prefix-lists.  When reading
this config file into FRR the read time goes
from 690 seconds to 650 seconds.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-11 14:18:13 -05:00
David Lamparter
543a26848d lib: add %pFXh to print prefix w/o prefixlen
Mostly for pimd, for the time being.  May be removed again if unused.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-11 13:43:19 +01:00
David Lamparter
424ec38499 lib: add JSON printfrr dict-key helper
`json_object_object_add()` adds keys/items to objects/dictionaries.
Useful to have a printfrr based variant for the key there.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-11 13:43:00 +01:00
Donald Sharp
37a9cb28ab
Merge pull request #10749 from opensourcerouting/live-log-polish
lib, vtysh: apply some polish to live-log feature
2022-03-10 19:41:48 -05:00
David Lamparter
a4af82ee2b lib, vtysh: report lost messages on live log
The vtysh live logs don't try to buffer messages when vtysh isn't
reading them fast enough.  Either the kernel has space and can accept
messages without delay, or it doesn't and we continue on.

While this is intentional (otherwise slow vtysh could block a routing
daemon), at least give the user an indication if messages were dropped.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-07 18:03:16 +01:00
David Lamparter
3bcdae106e lib: add monitor:<fd> command line log target
This provides direct raw log output with full metadata directly at
startup regardless of configuration details.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-07 18:03:16 +01:00
David Lamparter
834585bdb9 lib: add a few more bits to live log header
... and add some comments explaining the individual fields.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-07 18:03:16 +01:00
David Lamparter
da2fc19187 lib: support multiple --log options
Allow simultaneously enabling syslog, stdout and/or file logs.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-07 18:03:15 +01:00
David Lamparter
2eda953a2a lib: make live log sockets non-blocking
This was the intent here to begin with, not sure where I managed to
forget this along the way...

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-07 18:03:15 +01:00
David Lamparter
1609a9d636 lib: fix live log fields for crashlog
The timestamps used for the live log are wallclock, not monotonic.  Also
some fields were left uninitialized.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-07 18:03:13 +01:00
Donald Sharp
db0a45d0d6
Merge pull request #10741 from LabNConsulting/chopps/critfixgrpc
critical fixes for grpc
2022-03-07 11:49:48 -05:00
David Lamparter
d03440cab7 lib: fix log target removal when singlethreaded
While running singlethreaded, the RCU code is "dormant" and rcu_free is
an immediate operation.  This results in the log target loop accessing
free'd memory if a log target removes itself while a message is printed
(which is likely to happen on e.g. error conditions.)

Just use frr_each_safe to avoid this issue.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2022-03-07 17:23:12 +01:00
Christian Hopps
fe095adc24 lib: grpc: fix handling of "empty" yang type
- rather than coerce `const char *` to std:string&, just pass the
C ptr, as that's what is used anyway.

fixes #10578

Signed-off-by: Christian Hopps <chopps@labn.net>
2022-03-06 12:00:22 -05:00
Christian Hopps
83f6fce7d2 lib: grpc: fix shutdown code
fixes #9732

Signed-off-by: Christian Hopps <chopps@labn.net>
2022-03-06 12:00:17 -05:00
Christian Hopps
c85ecd6405 lib: grpc: initialize uninitialized member variables
fixes #9732, fixes #10578

Signed-off-by: Christian Hopps <chopps@labn.net>
2022-03-06 07:39:58 -05:00
Christian Hopps
96d434f853 lib: grpc: do not remove candidate entry too soon
Fix from Rafael Zalamena <rzalamena@opensourcerouting.org>

Signed-off-by: Christian Hopps <chopps@labn.net>
2022-03-06 07:37:52 -05:00
Rafael Zalamena
673e440770 lib: tweak northbound gRPC default timeout
Don't let open sockets hang for too long. This will fix an issue where a
improperly coded client (e.g. socat) could exaust the amount of open
file descriptors.

Documentation:
https://grpc.github.io/grpc/cpp/md_doc_keepalive.html

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2022-03-06 07:37:52 -05:00
Mobashshera Rasool
a34ce5c5e4 lib: Route-map failed for OSPF routes even for matching prefixes
This issue is applicable to other protocols as well.
When user has used route-map, even though the prefixes are falling
under the permit rule, the prefixes were denied and were shown
as inactive route in zebra.

Reason being the parameter which is of type enum was passed to the api
route_map_get_index and was typecasted to uint8_t *.
This problem is visible in case of Big Endian systems because we are
accessing the most significant byte.

'match_ret' field is an enum in the caller and so it is of 4 bytes,
the typecasting it to 1 byte and passing it to the api made
the api to put the value in the most significant byte
which was already zero previously. Therefore the actual value
RMAP_NOMATCH which was 1 never gets reset in this case.
Therefore the api always returns 'RMAP_NOMATCH' and hence
the prefixes are always denied.

Fixes: #9782
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
2022-03-04 03:57:09 -08:00
Rafael Zalamena
3c1f92018b lib: rotate log file supplied by command line
Call `zlog_file_rotate` for command file lines as well otherwise on
`SIGUSR1` the old descriptor will still be used and no new log file will
be created for the rotation.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2022-03-03 18:28:08 -03:00
Donald Sharp
5f503e5f5a lib: Fix corruption when routemap delete/add sequence happens
If a operator issues a series of route-map deletions and
then re-adds, *and* this triggers the hash table to realloc
to grow to a larger size, then subsuquent route-map operations
will be against a corrupted hash table.

Why?

Effectively the route-map code was inserting each
route-map <NAME> into a hash for storage.  Upon
deletion there is this concept of delayed processing
so the routemap code sets a bit `to-be-processed`
and marks the route-map for deletion.  This is
1 entry in the hash table.  Then if the operator
recreates the hash, FRR would add another hash
entry.  If another deletion happens then there
now are 2 deletion entries that are indistinguishable
from a hash perspective.

FRR stores the deleted name of the route-map so that
any delayed processing can lookup the name and only process
those peers that are related to that route-map name.
This is good as that if in say BGP, we do not want
to reprocess all the peers that don't use the route-map.

Solution:
The whole purpose of the delay of deletion and the
storage of the route-map is to allow the using protocol
the ability to process the route-map at a later time
while still retaining the route-map name( for more efficient
reprocessing ).  The problem exists because we are keeping
multiple copies of deletion events that are indistinguishable
from each other causing hash havoc.

The truth is that we only need to keep 1 copy of the
routemap in the table.  If the series of events is:
a) delete ( schedule processing )
b) add ( reschedule processing )

Current code ends up processing the route-map two times
and in this event we really just need to reprocess everything
with the new route-map.

If the series of events is:
a) delete (schedule processing )
b) add (reschedule)
c) delete (reschedule)
d) add (reschedule)

All this really points to is that FRR just needs to keep the last
in the series of maps and ensuring that FRR knows that we need
to continue processing the route-map.  So in the creation processing
if the hash has an entry for this map, the routemap code knows that
this is a deletion event.  Mark this route-map for later processing
if it was marked so.  Also in the lookup function do not return
a map if the map found was deleted.

Fixes: #10708
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-02 15:41:54 -05:00
Donald Sharp
91f2045d53 lib: Fix zclient.c enum event to enum zclient_event
zclient.c is using `enum event` let's rename it to a better
named data structure `enum zclient_event`.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-02 09:19:05 -05:00
Donald Sharp
55a70ffb78 lib: Rename enum event to enum vty_event
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-02 09:17:47 -05:00
Donald Sharp
4e2839de64 lib: Fix FreeBSD clock_gettime(CLOCK_THREAD_CPUTIME_ID,..) going backwards
On FreeBSD I have noticed that subsuquent calls to clock_gettime(..)
can return an after time that is before first calls value.
This in turn is generating CPU_HOG's because the subtraction
is wrapping into very very large numbers:

2022/02/28 20:12:58 SHARP: [PTDQA-70FG5]     start: 35.741981000  now: 35.740581000
2022/02/28 20:12:58 SHARP: [XK9YH-ZD8FA][EC 100663313] CPU HOG: task zclient_read (800744240) ran for 0ms (cpu time 18446744073709550ms)

(Please note I added the first line of debug to figure this issue out).

I have been asked to open a FreeBSD bug report and have done so.
In the mean time I think that it is important that FRR does
not generate bogus CPU HOG's on FreeBSD ( especially since
this may or may not be easily fixed and FRR has no control
over what version of the operating system, operators are
going to be running with FRR.

So, add a bit of specialized code that checks to see if
the after time in FreeBSD is before the now time in
thread_consumed_time and do some quick manipulations
to not have this issue.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-03-01 09:08:32 -05:00
David Lamparter
2821405a69
Merge pull request #10640 from donaldsharp/thread_timers 2022-03-01 11:45:36 +01:00
Russ White
e3aa338256
Merge pull request #10566 from whichbug/master
isisd: use base64 to encode the binary data.
2022-02-28 09:44:47 -05:00
Donald Sharp
5506048083
Merge pull request #10353 from opensourcerouting/vtysh-live-log
lib & vtysh: RFC5424 syslog + vtysh live log display
2022-02-28 09:43:29 -05:00