bgp->default_keepalive was not considered when setting
peer->v_keepalive, causing the effective keepalive interval to
always be (holdtime/3), even when default_keepalive < (holdtime/3).
This ensures that the default_keepalive is used when it's set and
is < (holdtime/3).
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
(cherry picked from commit d8bf8c6128f2e493d473148213bd663a500c7f73)
The RFC states:
The BGP Identifier is a 4-octet, unsigned, non-zero integer that
should be unique within an AS. The value of the BGP Identifier
for a BGP speaker is determined on startup and is the same for
every local interface and every BGP peer.
We were going slightly beyond this and ensuring that the address
was a specific range of addresses which is no longer relevant.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Fixed integration in FSM and packet handling.
* Added CLI "show" output, incl. JSON.
* For review and testing only.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
Issue: bgp_process_writes will be called when the fd is writable.
And it will bgp_generate_updgrp_packets to generate the
update packets no matter MRAI is set or not.
Fix: bgp_generate_updgrp_packets thread will return without sending
any update when MRAI timer is still running.
Signed-off-by: Richard Wu <wutong23@baidu.com>
Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Some were converted to bool, where true/false status is needed.
Converted to void only those, where the return status was only false or true.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Some code in bgp_route_refresh_receive was spread across several
lines because of an end of line commit. Move comment to a place
to allow better formating.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Currently During bgp open collision resolution if both
the router-id's are the same, we correctly follow
the RFC and close the connection. The problem is of course
that there is no notification of the error in configuration
to the end user other than a subtle open debug message.
Explicitly call out the miss-configuration as an error message
as that this miss-config took several hours of debugging to notice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
*Adding helper caller hooks function for signalling from BGPD
to ZEBRA to enable or disable GR feature in ZEBRA depending
on bgp per peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
*After a restarting router comes up and the bgp session is
successfully established with the peer. If the restarting
router doesn’t have any route to send, it send EOR to
the peer immediately before receiving updates from its peers.
*Instead the restarting router should send EOR, if the
selection deferral timer is not running OR count of eor received
and eor required are matches then send EOR.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
BGP disable EOR sending is a useful command for testing various
scenarios of BGP graceful restart.
* Added the hidden CLI command : bgp graceful-restart disable-eor
* The CLI will not be displayed in "show running-config" and will not
be stored in configuration file.
* When enabled, EOR will not be sent to peer
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Signed-off-by: Soman K S <somanks@vmware.com>
* Changing GR mode on a router needs a session reset from the
SAME router to negotiate new GR capability.
* The present GR implementation needs a session reset after every
new BGP GR mode change.
* When BGP session reset happens due to sending or receiving BGP
notification after changing BGP GR mode, there is no need of
explicit session reset.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Selection Deferral Timer for Graceful Restart.
* Added selection deferral timer handling function.
* Route marking as selection defer when update message is received.
* Staggered processing of routes which are pending best selection.
* Fix for multi-path test case.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
'NOTIFICATION' string in this message incorrectly implies a BGP
Notification message was the cause of this log. Removing it to
reduce confusion and replacing with function name.
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
frr_with_mutex(...) { ... } locks and automatically unlocks the listed
mutex(es) when the block is exited. This adds a bit of safety against
forgetting the unlock in error paths & co. and makes the code a slight
bit more readable.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This is the initial BMP skeleton from Yasuhiro Ohara.
(License/Signoff note: code published on github as GPLv2+.)
Signed-off-by: David Lamparter <equinox@diac24.net>
In a number of places, the JSON output had invalid key names for
AFI/SAFI. For example, the key name in JSON was "IPv4 Unicast" which
is invalid as a JSON Key name. Many JSON tools such as those used in
Ansible, jq etc. all fail to parse the output in these scenarios. The
valid name is ipv4Unicast. There's already a routine afi_safi_json()
defined to handle this change, but it was not consistently called.
The non-JSON version was called afi_safi_print() and it merely returned
the CLI version of the string, didn't print anything.
This patch deals with this issue by:
- Renaming afi_safi_print to get_afi_safi_str()
- get_afi_safi_str takes an additional param, for_json which if true
will return the JSON-valid string
- Renaming afi_safi_json to get_afi_safi_json_str()
- Creating a new routine get_afi_safi_vty_str() for printing to vty
- Consistently using get_afi_safi_str() with the appropriate for_json
value
Signed-off-by: Dinesh G Dutt <5016467+ddutt@users.noreply.github.com>
Unlike MRT dumps, BMP also provides packets sent by the router. Add
another hook for that.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The MRT dump code is already hooked in at the right places to write out
packets; the BMP code needs exactly the same access so let's make this
a hook.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
These counters are accessible through BMP and may be useful to monitor
bgpd. A CLI to show them could also be added if people are interested.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
last_reset_cause_size is the length *used* in last_reset_cause[]. It's
straight up used wrong here; we're saving off a reset cause and need to
check against the *available* size in last_reset_cause[].
This could actually have led to (hopefully rare) crashes in the assert
there, since the assert condition might fail incorrectly.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
notify_data_remote_as4 would contain garbage if optlen == 0, and also
as4 is in host byte order while the notify needs network byte order.
Signed-off-by: David Lamparter <equinox@diac24.net>
Modify the code such that we can auto turn the iana values of afi
and safi to pleasant to read strings.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is causing interop issues with vendors. According to the RFC,
receiver should ignore the NEXT_HOP attribute with MP_REACH_NLRI
present.
Signed-off-by: nikos <ntriantafillis@gmail.com>
The End of Rib notification in BGP is useful to know no matter
the circumstances. So change this from a debug message to
an info and cleanup the message a bit and add vrf we are in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit implements BGP peer-group overrides for the timer flags,
which control the value of the hold, keepalive, advertisement-interval
and connect connect timers. It was kept separated on purpose as the
whole timer implementation is quite complex and merging this commit
together with with the other flag implementations did not seem right.
Basically three new peer flags were introduced, namely
*PEER_FLAG_ROUTEADV*, *PEER_FLAG_TIMER* and *PEER_FLAG_TIMER_CONNECT*.
The overrides work exactly the same way as they did before, but
introducing these flags made a few conditionals simpler as they no
longer had to compare internal data structures against eachother.
Last but not least, the test suite has been adjusted accordingly to test
the newly implemented flag overrides.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
clang-analyze complains that data may be null, and since we didn't
explicitly check it (although we did check the overall packet length
minus the header length) it has a point.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This work is derived from a work done by China-Telecom.
That initial work can be found in [0].
As the gap between frr and quagga is important, a reworks has been
done in the meantime.
The initial work consists of bringing the following:
- Bringing the client side of flowspec.
- the enhancement of address-family ipv4/ipv6 flowspec
- partial data path handling at reception has been prepared
- the support for ipv4 flowspec or ipv6 flowspec in BGP open messages,
and the internals of BGP has been done.
- the memory contexts necessary for flowspec has been provisioned
In addition to this work, the following has been done:
- the complement of adaptation for FS safi in bgp code
- the code checkstyle has been reworked so as to match frr checkstyle
- the processing of IPv6 FS NLRI is prevented
- the processing of FS NLRI is stopped ( temporary)
[0] https://github.com/chinatelecom-sdn-group/quagga_flowspec/
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: jaydom <chinatelecom-sdn-group@github.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We lock and set peer->bgp at peer creation and only
remove it at deletion. Therefore these tests are
not needed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If a BGP message header fails validation we send a BGP NOTIFICATION from
the I/O thread. At this time we clear the output buffer, push a
NOTIFICATION and then call the manual write function for errors. But in
between the push and the write the main thread could have pushed some
other message. Thus we need to hold the lock for the duration of the
function. TOCTTOU.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
With the way things are set up, this bit of code would never actually
cause a deadlock, but would be highly likely in the future.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
After a batch of generated UPDATEs, call bgp_writes_on() once instead of
after generating each packet.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The subgroup coalesce timer controls how long updates to a particular
subgroup are delayed in order to allow additional peers to join the
subgroup. Presently the timer value is 200 ms. Increase it to 1 second
and adjust up as peers are configured, with an upper cap at 10s.
This cuts convergence time by a factor of 3 at large scale (300+ peers,
1000+ prefixes per peer).
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This is necessary because otherwise between the time we wipe the output
buffer and the time we push the NOTIFY onto it, the KA generation thread
could have pushed a KEEPALIVE in the middle.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
In the same vein as the round-robin input commit, this re-adds logic for
limiting the amount of time spent generating UPDATEs per generation
cycle. Missed this when shifting around wpkt_quanta; prior to MT it
limited both calls to write() as well as UPDATE generation.
Unfortunately, batching input processing severely impacts BGP initial
convergence times. As a consequence of the way update-groups were
implemented, advancing the state of the routing table based on prefixes
learned from one peer prior to all (or at least most) peers establishing
connections will cause us to start generating outbound UPDATEs, which is
a very expensive operation at present. This intensive processing starves
out bgp_accept(), delaying connection of additional peers. When
additional peers do connect the problem gets worse and worse, yielding
approximately exponential growth in convergence time dependent on both
peering and prefix counts. This behavior is present pre-multithreading
as well, but batched input exacerbates it.
Round-robin input processing marginally harms convergence times for
small topologies but should allow much larger topologies to function
within reasonable performance thresholds.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Different places scheduling the same thread should use the same
semantics and thread type. Additionally providing the back reference
here makes sure we only schedule the job once and avoids flooding the
event queue with jobs to process an empty buffer.
Apparently I didn't fully understand how subgroup packets make their way
out to individual peers. Turns out (on the base branch) we just busy
poll while waiting for packets to make their way onto subgroup queues.
While this needs to be fixed in the future, for now readding this logic
fixes performance issues with convergence.
Per previous work to ensure all FSM state is updated after processing
each message, read-quanta should be safe to set > 1.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Despaghettification of bgp_packet.c and bgp_fsm.c
Sometimes we call bgp_event_update() inline packet parsing.
Sometimes we post events instead.
Sometimes we increment packet counters in the FSM.
Sometimes we do it in packet routines.
Sometimes we update EOR's in FSM.
Sometimes we do it in packet routines.
Fix the madness.
bgp_process_packet() is now the centralized place to:
- Update message counters
- Execute FSM events in response to incoming packets
FSM events are now executed directly from this function instead of being
queued on the thread_master. This is to ensure that the FSM contains the
proper state after each packet is parsed. Otherwise there could be race
conditions where two packets are parsed in succession without the
appropriate FSM update in between, leading to session closure due to
receiving inappropriate messages for the current FSM state.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
bgpd supports setting a write-quanta that serves as a hint on how many
packets to write per I/O cycle. Now that input is buffered, it makes
sense to add the equivalent parameter for how many packets are processed
per cycle. This is *not* how many packets are read off the wire per I/O
cycle; rather it is how many packets are processed from the input buffer
in a given cycle after having been read off the wire and sanitized.
Since these values must be used from multiple threads, they have also
been made atomic.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Instead of reading a packet header and the rest of the packet in two
separate i/o cycles, instead read a chunk of data at one time and then
parse as many packets as possible out of the chunk.
Also changes bgp_packet.c to batch process packets.
To avoid thrashing on useless mutex locks, the scheduling call for
bgp_process_packet has been changed to always succeed at the cost of no
longer being cancel-able. In this case this is acceptable; following the
pattern of other event-based callbacks, an additional check in
bgp_process_packet to ignore stray events is sufficient. Before deleting
the peer all events are cleared which provides the requisite ordering.
XXX: chunk hardcoded to 5, should use something similar to wpkt_quanta
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Move and modify all network input related code to bgp_io.c
* Add a real input buffer to `struct peer`
* Move connection initialization to its own thread.c task instead of
piggybacking off of bgp_read()
* Tons of little fixups
Primary changes are in bgp_packet.[ch], bgp_io.[ch], bgp_fsm.[ch].
Changes made elsewhere are almost exclusively refactoring peer->ibuf to
peer->curr since peer->ibuf is now the true FIFO packet input buffer
while peer->curr represents the packet currently being processed by the
main pthread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
After implement threading, bgp_packet.c was serving the double purpose
of consolidating packet parsing functionality and handling actual I/O
operations. This is somewhat messy and difficult to understand. I've
thus moved all code and data structures for handling threaded packet
writes to bgp_io.[ch].
Although bgp_io.[ch] only handles writes at the moment to keep the noise
on this commit series down, for organization purposes, it's probably
best to move bgp_read() and its trappings into here as well and
restructure that code so that read()'s happen in the pthread and packet
processing happens on the main thread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If write() indicates that we should retry, just move along to the next
peer and come back later. No need to burn write() in a loop.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Changes all synchronization primitives to be dynamically allocated. This
should help catch any subtle errors in pthread lifecycles.
This change also pre-initializes synchronization primitives before
threads begin to run, eliminating a potential race condition that
probably would have caused a segfault on startup on a very fast box.
Also changes mutex and condition variable allocations to use
MTYPE_PTHREAD and updates tests to do the proper initializations.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Remove t_write
* Remove t_keepalive
These have been replaced by pthreads and are no longer needed. Since
some code looks at these values to determine if the threads are
scheduled, also add a new bitfield to store the same information.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>