mirror_frr

mirror of https://git.proxmox.com/git/mirror_frr synced 2025-07-11 00:11:33 +00:00

Author	SHA1	Message	Date
Donald Sharp	3e5a31b24e	bgpd: Convert `struct peer_connection` to dynamically allocated As part of the conversion to a `struct peer_connection` it will be desirable to have 2 pointers one for when we open a connection and one for when we receive a connection. Start this actual conversion over to this in `struct peer`. If this sounds confusing take a look at the bgp state machine for connections and how it resolves the processing of this router opening -vs- this router receiving an open. At some point in time the state machine decides that we are keeping one of the two connections. Future commits will allow us to untangle the peer/doppelganger duality with this abstraction. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-08-18 09:29:04 -04:00
Donald Sharp	5d52756735	bgpd: Move t_process_packet and t_process_packet_error to connection The t_process_packet thread events should be managed by the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-08-18 09:29:04 -04:00
Donald Sharp	e20c23fa5b	bgpd: Move status and ostatus to `struct peer_connection` The status and ostatus are a function of the `struct peer_connection` move it into that data structure. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-08-18 09:29:04 -04:00
Donald Sharp	71d72c4998	bgpd: READ and WRITE flags are a part of the connection Move PEER_THREAD_WRITES_ON and PEER_THREAD_READS_ON to be a part of the `struct peer_connection` since this is a connection oriented bit of data. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-08-18 09:29:04 -04:00
Donald Sharp	c528b3b153	bgpd: Move t_write and t_read into `struct peer_connection` Move the peer->t_write and peer->t_read into `struct peer_connection` as that these are properties of the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com> P# Please enter the commit message for your changes. Lines starting	2023-08-18 09:29:04 -04:00
Donald Sharp	ccb51e8266	bgpd: Convert bgp_io.c to take `struct peer_connection` bgp_io.c is clearly connection oriented so let's convert it over to using `struct peer_connection` Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-08-18 09:29:04 -04:00
Donald Sharp	1f32eb30d9	bgpd: Start abstraction of `struct peer_connection` BGP tracks connections based upon the peer. But the problem with this is that the doppelganger structure for it is being created. This has introduced a bunch of fragileness in that the peer exists independently of the connections to it. The whole point of the doppelganger structure was to allow BGP to both accept and initiate tcp connections and then when we get one to a `good` state we collapse into the appropriate one. The problem with this is that having 2 peer structures for this creates a situation where we have to make sure we are configing the `right` one and also make sure that we collapse the two independent peer structures into 1 acting peer. This makes no sense let's abstract out the peer into having 2 connection one for incoming connections and one for outgoing connections then we can easily collapse down without having to do crazy stuff. In addition people adding new features don't need to have to go touch a million places in the code. This is the start of this abstraction. In this commit we'll just pull out the fd and input/output buffers into a connection data structure. Future commits will abstract further. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-08-18 09:29:04 -04:00
Donald Sharp	102bad0a9b	bgpd: With io limit allow parsing to continue even if memory is low Commit: `a0b937de42` Introduced the idea of a input Q packet limit. Say you read in 635000 bytes of data and the input Q is already at it's limit (currently 1000) then when bgp_process_reads runs it will assert because there is less then a BGP_MAX_PACKET_SIZE in ibuf_work. Don't assert as that it's irrelevant. Even if we can't read a full packet in let's let the whole system keep working as that as the input Q length comes down we will start pulling down the ibuf_work and it will be ok. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-07-24 10:41:00 -04:00
Donald Sharp	bdc1762405	bgpd: Replace peer->ibuf_scratch The peer->ibuf_scratch was allocating 65535 * 10 bytes for scratch space to hold data incoming from a read from a peer. When you have 4k peers this is 262,1400,000 or 262 mb of data. Which is crazy large. Especially since the i/o pthread is reading per peer without any chance of having the data interfere with other reads. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-07-21 13:10:03 -04:00
Naveen Thanikachalam	7e28578bb6	bgpd: Ensure peer data structure is accessed only when BGPD is not terminating With these changes, the code ensures that the peer data-structures are accessed only after it knows that BGPD is not terminating. Authored-by: Naveen Thanikachalam <nthanikachal@vmware.com> Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>	2023-06-28 00:27:44 -07:00
Donald Sharp	24a58196dd	*: Convert event.h to frrevent.h We should probably prevent any type of namespace collision with something else. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-03-24 08:32:17 -04:00
Donald Sharp	e16d030c65	*: Convert THREAD_XXX macros to EVENT_XXX macros Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-03-24 08:32:17 -04:00
Donald Sharp	332beb64b8	*: Convert thread_cancelXXX to event_cancelXXX Modify the code base so that thread_cancel becomes event_cancel Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-03-24 08:32:17 -04:00
Donald Sharp	907a2395f4	*: Convert thread_add_XXX functions to event_add_XXX Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-03-24 08:32:17 -04:00
Donald Sharp	e6685141aa	*: Rename `struct thread` to `struct event` Effectively a massive search and replace of `struct thread` to `struct event`. Using the term `thread` gives people the thought that this event system is a pthread when it is not Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-03-24 08:32:17 -04:00
Donald Sharp	cb37cb336a	*: Rename thread.[ch] to event.[ch] This is a first in a series of commits, whose goal is to rename the thread system in FRR to an event system. There is a continual problem where people are confusing `struct thread` with a true pthread. In reality, our entire thread.c is an event system. In this commit rename the thread.[ch] files to event.[ch]. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-03-24 08:32:16 -04:00
David Lamparter	acddc0ed3c	*: auto-convert to SPDX License IDs Done with a combination of regex'ing and banging my head against a wall. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2023-02-09 14:09:11 +01:00
Donald Sharp	f1b1efdefc	bgpd: Don't try to recursively hold peer io mutex BGP was modified in `a0b937de42` to grab the peer->io_mtx before validating the header to ensure that the input Queue was not being modified by anyone else at that moment in time. Unfortunately validate_header can detect a problem and attempt to relock the mutex, which deadlocks. This deadlock in the bgp_io pthread is the lone deadlock at first, eventually though bgp attempts to write another packet to the peer( say when the it's time to send the next packet ) and the main pthread of bgpd becomes deadlocked and then the whole bgpd process is stuck at that point in time leaving us dead in the water. The point of locking the mutex earlier was to ensure that the input Queue wasn't being modified by anyone else, (Say reading off it ) as that we wanted to ensure that we don't hold more packets then necessary. Let's grab the mutex long enough to look at the input Q size, this ensure that we have room and then we can validate_header and do the right thing from there. We'll need to lock the mutex when we actually move it into the input Q as well. Fixes: #12725 Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2023-02-02 16:39:01 -05:00
Stephen Worley	3448b62542	bgpd: debug guard inQ limit Add a debug guard for the inQ limit. Signed-off-by: Stephen Worley <sworley@nvidia.com>	2022-11-15 15:28:09 -05:00
Stephen Worley	e185a2a956	bgpd: make the input-queue log more understandable Make the input-queue log a bit more understandble to a person debugging an issue. Signed-off-by: Stephen Worley <sworley@nvidia.com>	2022-10-26 13:23:21 -04:00
Stephen Worley	847ee2bb2e	bgpd: checkpatch fixes in bgp_io code Some checkpatch fixes for comments in the bgp_io code. Signed-off-by: Stephen Worley <sworley@nvidia.com>	2022-10-24 18:25:56 -04:00
Stephen Worley	a0b937de42	bgpd,doc: limit InQ buf to allow for back pressure Add a default limit to the InQ for messages off the bgp peer socket. Make the limit configurable via cli. Adding in this limit causes the messages to be retained in the tcp socket and allow for tcp back pressure and congestion control to kick in. Before this change, we allow the InQ to grow indefinitely just taking messages off the socket and adding them to the fifo queue, never letting the kernel know we need to slow down. We were seeing under high loads of messages and large perf-heavy routemaps (regex matching) this queue would cause a memory spike and BGP would get OOM killed. Modifying this leaves the messages in the socket and distributes that load where it should be in the socket buffers on both send/recv while we handle the mesages. Also, changes were made to allow the ringbuffer to hold messages and continue to be filled by the IO pthread while we wait for the Main pthread to handle the work on the InQ. Memory spike seen with large numbers of routes flapping and route-maps with dozens of regex matching: ``` Memory statistics for bgpd: System allocator statistics: Total heap allocated: > 2GB Holding block headers: 516 KiB Used small blocks: 0 bytes Used ordinary blocks: 160 MiB Free small blocks: 3680 bytes Free ordinary blocks: > 2GB Ordinary blocks: 121244 Small blocks: 83 Holding blocks: 1 ``` With most of it being held by the inQ (seen from the stream datastructure info here): ``` Type : Current# Size Total Max# MaxBytes ... ... Stream : 115543 variable 26963208 15970740 3571708768 ``` With this change that memory is capped and load is left in the sockets: RECV Side: ``` State Recv-Q Send-Q Local Address:Port Peer Address:Port Process ESTAB 265350 0 [fe80::4080:30ff:feb0:cee3]%veth1:36950 [fe80::4c14:9cff:fe1d:5bfd]:179 users:(("bgpd",pid=1393334,fd=26)) skmem:(r403688,rb425984,t0,tb425984,f1816,w0,o0,bl0,d61) ``` SEND Side: ``` State Recv-Q Send-Q Local Address:Port Peer Address:Port Process ESTAB 0 1275012 [fe80::4c14:9cff:fe1d:5bfd]%veth1:179 [fe80::4080:30ff:feb0:cee3]:36950 users:(("bgpd",pid=1393443,fd=27)) skmem:(r0,rb131072,t0,tb1453568,f1916,w1300612,o0,bl0,d0) ``` Signed-off-by: Stephen Worley <sworley@nvidia.com>	2022-10-24 18:23:29 -04:00
Mark Stapp	71ca5b09bc	bgpd: avoid notify race between io and main pthreads The "bgp_notify_" apis in bgp_packet.c generate a notification to a peer, usually during error handling. The io pthread wants to send notifications in a couple of cases during early received-packet validation - but the existing api interacts with the peer struct itself, and that's not safe. Add a new api for use by the io pthread, and adjust the main notify api so that it can avoid touching the peer struct. Signed-off-by: Mark Stapp <mstapp@nvidia.com>	2022-09-08 16:14:36 -04:00
Donald Sharp	083ec940ab	bgpd: Convert from bgp_clock() to monotime() Let's convert to our actual library call instead of using yet another abstraction that makes it fun for people to switch daemons. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2022-08-24 08:23:40 -04:00
Donald Sharp	cb1991af8c	*: frr_with_mutex change to follow our standard convert: frr_with_mutex(..) to: frr_with_mutex (..) To make all our code agree with what clang-format is going to produce Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2022-07-20 15:50:32 -04:00
David Lamparter	bd9fb6f368	bgpd: implement SendHoldTimer As described by https://www.ietf.org/archive/id/draft-spaghetti-idr-bgp-sendholdtimer-04.html Since this replicates the HoldTime check on the receiver that is already part of the protocol, I do not believe it necessary to wait for IETF progress on this draft. It's just replicating an existing element of the protocol at the other side of the session. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2022-05-19 12:14:40 +02:00
Donald Sharp	cc9f21da22	*: Change thread->func to return void instead of int The int return value is never used. Modify the code base to just return a void instead. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2022-02-23 19:56:04 -05:00
Donald Sharp	1fae5c8e28	bgpd: bgp_packet_process_error can access peer after deletion in bgp_io.c upon packet read of some error we are storing the peer pointer on a thread to call bgp_packet_process_error. In this case an event is generated that is not guaranteed to be run immediately. It could come in after the peer data structure is deleted and as such we now are writing into memory that we no longer possibly own as a peer data structure. Modify the code so that the peer can track the thread associated with the read error and then it can wisely kill that thread when deleting the peer data structure. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2021-08-08 08:23:24 -04:00
Mark Stapp	f71e1ff6a9	Merge pull request #8545 from opensourcerouting/assert-our-own *: make our own assert() actually work	2021-05-03 11:17:36 -04:00
Quentin Young	338f4a78cc	bgpd: avoid allocating very large stack buffer As pointed out on code review of BGP extended messages, increasing the maximum BGP message size has the consequence of growing the dynamically sized stack buffer up to 650K. While unlikely to exceed modern stack sizes it is still unreasonably large. Remedy this with a heap buffer. Signed-off-by: Quentin Young <qlyoung@nvidia.com>	2021-04-29 12:12:32 -04:00
Quentin Young	fe2e3bae6a	Revert "bgpd: improve socket read performance" This reverts commit `97a16e6481`.	2021-04-29 12:12:32 -04:00
David Lamparter	6a0eb6885b	*: drop zassert.h It's not actually working properly... Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2021-04-23 12:06:35 +02:00
Quentin Young	b8cfb2cd41	bgpd: fix uninit value when handling bgp read errs Compiler warns about uninitialized value, although in practice it is unreachable. Also updates a function comment explaining what that value does. Signed-off-by: Quentin Young <qlyoung@nvidia.com>	2021-04-15 18:07:44 -04:00
Mark Stapp	e0d550dfea	bgpd: use add_event instead of add_timer with zero timeout Just use events in a few places where timers with zero timeout were being used. Signed-off-by: Mark Stapp <mjs@voltanet.io>	2021-03-17 16:10:13 -04:00
Mark Stapp	6af96fa383	bgpd: handle socket read errors in the main pthread Add a handler for socket errors that runs in the main pthread, rather than the io pthread. When the io pthread encounters a read error, capture the error and schedule a task for the main pthread. Signed-off-by: Mark Stapp <mjs@voltanet.io>	2021-03-09 11:13:41 -05:00
Rafael Zalamena	97a16e6481	bgpd: improve socket read performance Use the new ringbuffer API function to read file descriptors directly to the ringbuffer instead of using intermediary buffers. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2021-03-05 18:17:32 -03:00
Donatas Abraitis	ef56aee47c	bgpd: Add BGP Extended message support Implement https://www.rfc-editor.org/rfc/rfc8654.txt ``` > \| jq '."192.168.10.25".neighborCapabilities.extendedMessage' "advertisedAndReceived" ``` Another side is Bird: ``` BIRD 2.0.7 ready. Name Proto Table State Since Info v4 BGP --- up 19:39:15.689 Established BGP state: Established Neighbor address: 192.168.10.123 Neighbor AS: 65534 Local AS: 65025 Neighbor ID: 192.168.100.1 Local capabilities Multiprotocol AF announced: ipv4 Route refresh Extended message Graceful restart 4-octet AS numbers Enhanced refresh Long-lived graceful restart Neighbor capabilities Multiprotocol AF announced: ipv4 Route refresh Extended message Graceful restart 4-octet AS numbers ADD-PATH RX: ipv4 TX: Enhanced refresh Session: external AS4 Source address: 192.168.10.25 Hold timer: 140.139/180 Keepalive timer: 9.484/60 Channel ipv4 State: UP Table: master4 Preference: 100 Input filter: ACCEPT Output filter: ACCEPT Routes: 9 imported, 3 exported, 8 preferred Route change stats: received rejected filtered ignored accepted Import updates: 9 0 0 0 9 Import withdraws: 2 0 --- 2 0 Export updates: 11 8 0 --- 3 Export withdraws: 0 --- --- --- 0 BGP Next hop: 192.168.10.25 ``` Tested at least as well with to make sure it works with backward compat.: ExaBGP 4.0.2-1c737d99. Arista vEOS 4.21.14M Testing by injecint 10k routes with: ``` sharp install routes 172.16.0.1 nexthop 192.168.10.123 10000 ``` Before extended message support: ``` 2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809 2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 2186 (max message len: 4096) numpfx 427 2021/03/01 07:18:53 BGP: u1:s1 send UPDATE len 3421 (max message len: 4096) numpfx 674 ``` After extended message support: ``` 2021/03/01 07:20:11 BGP: u1:s1 send UPDATE len 50051 (max message len: 65535) numpfx 10000 ``` Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>	2021-03-04 21:32:36 +02:00
Soman K S	a77e2f4bab	bgpd: Advertise FIB installed routes to bgp peers (Part 3) * Process FIB update in bgp_zebra_route_notify_owner() and call group_announce_route() if route is installed * When bgp update is received for a route which is not installed earlier (flag BGP_NODE_FIB_INSTALLED is not set) and suppress fib is enabled set the flag BGP_NODE_FIB_INSTALL_PENDING to indicate fib install is pending for the route. The route will be advertised when zebra send ZAPI_ROUTE_INSTALLED status. * The advertisement delay (BGP_DEFAULT_UPDATE_ADVERTISEMENT_TIME) is added to allow more routes to be sent in single update message. This is required since zebra sends route notify message for each route. The delay will be applied to update group timer which advertises routes to peers. Signed-off-by: kssoman <somanks@gmail.com>	2020-11-06 08:55:56 +05:30
Mark Stapp	cbd492990d	Merge pull request #7194 from qlyoung/tracing Tracing	2020-10-24 16:50:35 -04:00
Quentin Young	0c3436aa22	bgpd: move packet read tracepoint out of mutex Signed-off-by: Quentin Young <qlyoung@nvidia.com>	2020-10-23 15:13:51 -04:00
Quentin Young	c7bb4f006b	lib, bgpd: convert lttng tracepoints to frrtrace() - tracepoint() -> frrtrace() - tracelog() -> frrtracelog() - tracepoint_enabled() -> frrtrace_enabled() Also removes copypasta'd #ifdefs for those LTTng macros, those are handled in lib/trace.h Signed-off-by: Quentin Young <qlyoung@nvidia.com>	2020-10-23 15:13:51 -04:00
Quentin Young	d9a03c5736	bgpd: add basic packet-related tracepoints Add tracepoints for: - packet pushed to internal rx queue - packet dequeued from rx queue and processed Signed-off-by: Quentin Young <qlyoung@nvidia.com>	2020-10-23 15:13:51 -04:00
Mark Stapp	5047884528	*: unify thread/event cancel macros Replace all lib/thread cancel macros, use thread_cancel() everywhere. Only the THREAD_OFF macro and thread_cancel() api are supported. Also adjust thread_cancel_async() to NULL caller's pointer (if present). Signed-off-by: Mark Stapp <mjs@voltanet.io>	2020-10-23 12:16:52 -04:00
Soman K S	7c9d82cdd5	bgpd: Avoid extra copy of received data to buffer When received packet is processed in bgp_process_reads(), the data is copied to static buffer and then copied to stream buffer. The data can be copied directly to stream buffer which will avoid extra memcpy Signed-off-by: kssoman <somanks@gmail.com>	2020-05-30 13:53:45 +05:30
Quentin Young	362353195a	bgpd, lib: fix style from BGP GR code This patch fixes the noncompliant style for the following commit range: `4a6e80fbf` `2ba1fe695` `efcb2ebbb` `8c48b3b69` `dc95985fe` `0f0444fbd` `85ef4179a` `eb451ee58` `2d3dd828d` `9e3b51a7f` `d6e3c15b6` `34aa74486` `6102cb7fe` `d7b3cda6f` `2bb5d39b1` `5f9c1aa29` `5cce3f054` `3a75afa4b` `f009ff269` `cfd47646b` `2986cac29` `055679e91` `034e185dc` `794b37d52` `b0965c44e` `949b0f24f` `63696f1d8` Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>	2020-02-04 15:19:04 -05:00
bisdhdh	d7b3cda6f7	bgpd: BGP tcp session failed to apply GR configuration on the transferred bgp tcp connection. When the BGP peer is configured between two bgp routes both routers would create peer structure , when they receive each other’s open message. In this event both speakers, open duplicate TCP sessions and send OPEN messages on each socket simultaneously, the BGP Identifier is used to resolve which socket should be closed. If BGP GR is enabled the old tcp session is dumped and the new session is retained. So while this transfer of connection is happening, if all the bgp gr config is not migrated to the new connection, the new bgp gr mode will never get applied. Fix Summary: 1. Replicate GR configuration from the old session to the new session in bgp_accept(). 2. Replicate GR configuration from stub to full-fledged peer in bgp_establish(). 3. Disable all NSF flags, clear stale routes (if present), stop restart & stale timers (if they are running) when the bgp GR mode is changed to “Disabled”. 4. Disable R-bit in cap, if it is not set the received open message. Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>	2020-01-23 09:34:25 +05:30
David Lamparter	2b64873d24	*: generously apply const const const const your boat, merrily down the stream... Signed-off-by: David Lamparter <equinox@diac24.net>	2019-12-02 15:01:29 +01:00
Quentin Young	185553660f	bgpd: speak soothing words to scanbuild Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>	2019-10-15 18:25:02 +00:00
Quentin Young	093279cd02	bgpd: vector I/O Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>	2019-10-14 18:41:53 +00:00
Quentin Young	421a7dfc93	bgpd: move assert out of error case bgp_process_packets has an assert to make sure an appropriate amount of working space in the input buffer has been freed up for future reads. However, this assert shouldn't be made when we have encountered an error that's going to tear down the session, because in this case we may not be able to process the full contents of the input buffer. Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>	2019-10-14 18:41:07 +00:00

1 2

96 Commits