Firstly, *keep no change* for `hash_get()` with NULL
`alloc_func`.
Only focus on cases with non-NULL `alloc_func` of
`hash_get()`.
Since `hash_get()` with non-NULL `alloc_func` parameter
shall not fail, just ignore the returned value of it.
The returned value must not be NULL.
So in this case, remove the unnecessary checking NULL
or not for the returned value and add `void` in front
of it.
Importantly, also *keep no change* for the two cases with
non-NULL `alloc_func` -
1) Use `assert(<returned_data> == <searching_data>)` to
ensure it is a created node, not a found node.
Refer to `isis_vertex_queue_insert()` of isisd, there
are many examples of this case in isid.
2) Use `<returned_data> != <searching_data>` to judge it
is a found node, then free <searching_data>.
Refer to `aspath_intern()` of bgpd, there are many
examples of this case in bgpd.
Here, <returned_data> is the returned value from `hash_get()`,
and <searching_data> is the data, which is to be put into
hash table.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
When EVPN prefix route with a gateway IP overlay index is imported into the IP
vrf at the ingress PE, BGP nexthop of this route is set to the gateway IP.
For this vrf route to be valid, following conditions must be met.
- Gateway IP nexthop of this route should be L3 reachable, i.e., this route
should be resolved in RIB.
- A remote MAC/IP route should be present for the gateway IP address in the
EVI(L2VPN table).
To check for the first condition, gateway IP is registered with nht (nexthop
tracking) to receive the reachability notifications for this IP from zebra RIB.
If the gateway IP is reachable, zebra sends the reachability information (i.e.,
nexthop interface) for the gateway IP.
This nexthop interface should be the SVI interface.
Now, to find out type-2 route corresponding to the gateway IP, we need to fetch
the VNI for the above SVI.
To do this VNI lookup effitiently, define a hashtable of struct bgpevpn with
svi_ifindex as key.
struct hash *vni_svi_hash;
An EVI instance is added to vni_svi_hash if its svi_ifindex is nonzero.
Using this hash, we obtain struct bgpevpn corresponding to the gateway IP.
For gateway IP overlay index recursive lookup, once we find the correct EVI, we
have to lookup its route table for a MAC/IP prefix. As we have to iterate the
entire route table for every lookup, this lookup is expensive. We can optimize
this lookup by adding all the remote IP addresses in a hash table.
Following hash table is defined for this purpose in struct bgpevpn
Struct hash *remote_ip_hash;
When a MAC/IP route is installed in the EVI table, it is also added to
remote_ip_hash.
It is possible to have multiple MAC/IP routes with the same IP address because
of host move scenarios. Thus, for every address addr in remote_ip_hash, we
maintain list of all the MAC/IP routes having addr as their IP address.
Following structure defines an address in remote_ip_hash.
struct evpn_remote_ip {
struct ipaddr addr;
struct list *macip_path_list;
};
A Boolean field is added to struct bgp_nexthop_cache to indicate that the
nexthop is EVPN gateway IP overlay index.
bool is_evpn_gwip_nexthop;
A flag BGP_NEXTHOP_EVPN_INCOMPLETE is added to struct bgp_nexthop_cache.
This flag is set when the gateway IP is L3 reachable but not yet resolved by a
MAC/IP route.
Following table explains the combination of L3 and L2 reachability w.r.t.
BGP_NEXTHOP_VALID and BGP_NEXTHOP_EVPN_INCOMPLETE flags
* | MACIP resolved | MACIP unresolved
*----------------|----------------|------------------
* L3 reachable | VALID = 1 | VALID = 0
* | INCOMPLETE = 0 | INCOMPLETE = 1
* ---------------|----------------|--------------------
* L3 unreachable | VALID = 0 | VALID = 0
* | INCOMPLETE = 0 | INCOMPLETE = 0
Procedure that we use to check if the gateway IP is resolvable by a MAC/IP
route:
- Find the EVI/L2VRF that belongs to the nexthop SVI using vni_svi_hash.
- Check if the gateway IP is present in remote_ip_hash in this EVI.
When the gateway IP is L3 reachable and it is also resolved by a MAC/IP route,
unset BGP_NEXTHOP_EVPN_INCOMPLETE flag and set BGP_NEXTHOP_VALID flag.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
We are inconsistently using peer_establiahed(peer) with
sometimes using `peer->status == Established`. Just Convert
over to using the function for consistency.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
bgp is currently registering v6 LL as nexthops to be tracked
from zebra. This presents several problems.
a) zebra does not properly track multiple prefixes that match
the same route properly at this point in time.
b) BGP was receiving nexthops that were just incorrect because
of (a).
c) When a nexthop changed that really didn't affect the v6 LL
we were responding incorrectly because of this
Modify the code such that bgp nexthop tracking notices that
we are trying to register a v6 LL. When we do so, shortcut
and watch interface up/down events for this v6 LL and do
the work when an interface goes up / down for this type
of tracking.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Some of the `show memory` strings in bgp are longer than the
columns we have allocated for it. Shorten some strings to
make them fit and have the output pleasing to the eye.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The tip hash is only used when we are dealing with
evpn. In bgp_nexthop_self we are doing a memset
irrelevant of whether we will ever find data. Yes
hash_lookup will return pretty quickly.
Modify the code to avoid doing a memset in the case
where the tip hash is empty as that we know we'll
never find anything. With full BGP feeds this
small memset does take some time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
since the addition of srte_color to the comparison for bgp nexthops
it is possible to have several nexthops per prefix but since zebra
only sores a per prefix registration we should not unregister for
nh notifications for a prefix unti all the nexthops for that prefix
have been deleted. Otherwise we can get into a deadlock situation
where BGP thinks we have registered but we have unregistered from zebra.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Extend the NHT code so that only the affected BGP routes are affected
whenever an SR-policy is updated on zebra.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Fist, routing tables aren't the most appropriate data structure
to store nexthops and imported routes since we don't need to do
longest prefix matches with that information.
Second, by converting the NHT code to use rb-trees, we can index
the nexthops using additional information, not only the destination
address. This will be useful later to index bgpd's nexthops by
both destination and SR-TE color.
Co-authored-by: Sebastien Merle <sebastien@netdef.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Display next hop resolution information, whether the "detail" option is
specified or not as it is quite fundamental and only minimally increases
the output.
Introduce option to look at a specific NHT entry, which will also show
the paths associated with that entry.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Add new function `bgp_node_get_prefix()` and modify
the bgp code base to use it.
This is prep work for the struct bgp_dest rework.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some were converted to bool, where true/false status is needed.
Converted to void only those, where the return status was only false or true.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
A BGP update-group is dynamically created to group together a set of peers
such that any BGP updates can be formed just once for the entire group and
only the next hop attribute may need to be modified when the update is sent
out to each peer in the group. The update formation code attempts to
determine as much as possible if the next hop will be set to our own IP
address for every peer in the group. This helps to avoid additional checks
at the point of sending the update (which happens on a per-peer basis) and
also because some other attributes may/could vary depending on whether the
next hop is set to our own IP or not. Resetting the next hop to our own IP
address is the most common behavior for EBGP peerings in the absence of
other user-configured or internal (e.g., for l2vpn/evpn) settings and
peerings on a shared subnet.
The code had a flaw in the multiaccess check to see if there are peers in
the update group which are on a shared subnet as the next hop of the path
being announced - the source peer could itself be in the same update group
and cause the check to give an incorrect result. Modify the check to skip
the source peer so that the check is more accurate.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
For some reason we are getting a compile error around a variable I didn't
touch in the other commits. Make it happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is no need to have a temp variable to then store that
data in another temporary variable.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The new_afi and afi were being used over and over. Switch
to the end result we want and just use that from the get go.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The creation of a prefix pointer is unnecessary. Save the
prefix as part of the actual data structure. This will
reduce the data needed by 8 bytes per nexthop stored.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* IPv6 routes received via a ibgp session with one of its own interface as
nexthop are getting installed in the BGP table.
*A common table to be implemented should take cares of both
ipv4 and ipv6 connected addresses.
Signed-off-by: Biswajit Sadhu sadhub@vmware.com
When a BGP next hop tracking (NHT) entry is created for a peer,
display it in the corresponding "show" command output.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Prevent the ebgp sender from changing the nexthop( which is same as the ebgp neighbour ipv6 address),
while sending updates to its ipv6 neighbor.So,if the nexthop of the ipv6 route is same as the ipv6
neighbour address do not change the next hop to your own ip.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Prevent IPv6 routes received via a ibgp session with one of its own interface
ip as nexthop from getting installed in the BGP table.
Implemented IPV6 HASH table, where we need to add any ipv6 address as they
gets configured and delete them from the HASH table as the ipv6 addresses
get unconfigured. The above hash table is used to verify if any route learned
via BGP has nexthop which is equal to one of its its connected ipv6 interface.
Signed-off-by: Biswajit Sadhu sadhub@vmware.com
The bgp_connected_set_node_info and bgp_connected_get_node_info
function names were slightly backwards lets fix them up
to bgp_node_set_bgp_connected_ref_info and bgp_node_get_bgp_connected_ref_info
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_nexthop_set_node_info and bgp_nexthop_get_node_info
function names were slightly backwards, rename to bgp_node_set and get
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There is no reason that bgp should be including zebra
headers into it's code base, it is a violation of
their respective name spaces.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we attempt to register nexthops before we have the zebra
connection, they will not be installed. After we have noticed
that we are up, re-install them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problems were reported with the name of the default vrf and the
default bgp instance being different, creating confusion. This
fix changes both to "default" for consistency.
Ticket: CM-21791
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-7658
Testing: manual testing and automated tests before pushing
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Recent changes to the nht code in bgp caused us to actually
keep a true count of v6 nexthop paths when using v4 over v6.
This change introduced a race condition on shutdown on who
got to the bnc cache first( the v4 table or not ). Effectively
we were allowing the continued existence of the path->nexthop
pointing to the freed bnc. This was especially true when
we had route leaking. So when we free the bnc make sure
we clean up the path->nexthop variables pointing at it too.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When cleaning up a interface string, from the linked list we were
dropping the name pointer which held the allocated martian address
intf string.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Track the memory type associated with the bartian address
interface a bit better, instead of using MTYPE_TMP.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_nexthop_cache data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_connected_ref data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Track the refcount a bit differently as that it is possible
to get into situations where we have multiple calls for the
same ifc. So let's just keep a list of the ifc's off of
each `struct bgp_addr` and then keep the hash entry based
upon list count or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct bgp_addr` is not needed for anything other than
the address hash. Isolate this data structure so that it
is not polluting up the name space.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit removes various parts of the bgpd implementation code which
are unused/useless, e.g. unused functions, unused variable
initializations, unused structs, ...
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
When we are shutting down, there exists a code path
where the connected table leaks some memory. Cleanup
the code to remove the memory.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have 2 code paths that were duplicating a bunch of code
for the deletion of connected prefixes.
This simplifies the code path and makes the code look a bit
cleaner.
I did not touch the _add path because the v4 if statement
had some code I did not have time to look into. Future project.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Routes that have labels must be sent via a nexthop that also has labels.
This change notes whether any path in a nexthop update from zebra contains
labels. If so, then the nexthop is valid for routes that have labels.
If a nexthop update has no labeled paths, then any labeled routes
referencing the nexthop are marked not valid.
Add a route flag BGP_INFO_ANNC_NH_SELF that means "advertise myself
as nexthop when announcing" so that we can track our notion of the
nexthop without revealing it to peers.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Perf results at scale( >1k peers) showed a non-trivial
amount of time spent in bgp_multiaccess_check_v4. Upon
function examination we are looking up the nexthops
connected node in each call as well as having to unlock
it after each iteration. Rewrite to lookup the nexthop
node once.
This should reduce the node lookup by aproximately 1/2
which should yield some performance results. There are
probably better things to do here but would require
deeper thought.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem with not finding the correct bgp instance when doing the command
"show ip bgp vrf <vrf> nexthop" resolved by setting up the arg values
correctly. Manual testing fine. bgp-smoke had no new failures.
Ticket: CM-17454
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-6664
1) Add hash names to all hash_create calls
2) Fix community_hash, ecommunity_hash and lcommunity_hash key
creation
3) Fix output of community and lcommunity iterators( why would
we want to see the memory location of the backet? ).
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There are two parts to this commit:
1. create a database of self tunnel-ip for used in martian nexthop check
In a CLAG setup, the tunnel-ip (VNI UP) notification comes before the clag-anycast-ip comes up in the system.
This was causing our self next hop check to fail and we were instaling routes with martian nexthop in zebra.
We need to keep this info in a seperate database for all local tunnel-ip.
This database will be used in parallel with the self next hop database to martian nexthop checks.
2. When a local VNI comes up, update the tunnel-ip database and filter routes in the RD table if necessary
In case of EVPN we might receive routes from clag peer before the clag-anycast ip and VNI is up on the system.
We will store the routes in the RD table for later processing.
When VNI comes UP, we loop thorugh all the routes and install them in zebra if required.
However, we were missing the martian nexthop check in this code path.
From now onwards, when a VNI comes UP,
we will first update the tunnel-ip database
We then loop through all the routes in RD table and apply martian next hop filter if required.
Things not covered in this commit but are required:
This processing is needed in general when an address becomes a connected address.
We need to loop through all the routes in BGP and apply martian nexthop filter if necessary.
This will be taken care in a seperate bug
Ticket:CM-17271/CM-16911
Reviewed By: ccr-6542
Testing Done: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Ensure that the check for martian next hop is correct, including for MP
nexthops, if IPv4.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
1) Make [<view|vrf> WORD] consistent
2) Fix inconsistent help string
3) Fix the show .. vrf all command
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
thread.c fails to build properly on systems that do
not have a CLOCK_MONOTONIC. Therefore there is
no need for bgp to have knowledge of it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Additionally:
* Add [ip] to a couple bgp show commands
* Quick refactor of a couple ISIS commands
* Quick refactor of a couple OSPF6 commands
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If a command is put into the VIEW_NODE, it is going into the
ENABLE_NODE as well. This is especially true for show commands.
As such if a command is in both consolidate it down to VIEW_NODE.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is a rather large mechanical commit that splits up the memory types
defined in lib/memtypes.c and distributes them into *_memory.[ch] files
in the individual daemons.
The zebra change is slightly annoying because there is no nice place to
put the #include "zebra_memory.h" statement.
bgpd, ospf6d, isisd and some tests were reusing MTYPEs defined in the
library for its own use. This is bad practice and would break when the
memtype are made static.
Acked-by: Vincent JARDIN <vincent.jardin@6wind.com>
Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
[CF: rebased for cmaster-next]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
With the addition of the AFI_ETHER we need
to initialize the appropriate tables for
nexthop's.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Any interface flags/parameter change (e.g., MTU, PROMISC flag change) is
notified by zebra to clients as an "up" event. BGP literally treats this
as the interface coming up and kicks all neighbors on that interface (i.e.,
directly connected peers). When doing so for IPv4 peers on the interface
(numbered or unnumbered /30-/31) or IPv6 numbered peers, peers that may
already be Established are also flapped; when doing so for IPv6 unnumbered
peers (classic 'neighbor swpX interface' scenario with no configured IP
address on interface), only peers not in Established state are processed.
This patch fixes the code to ensure that in all cases, only non-Established
peers are kicked.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Chris Cormier <chriscormier@cumulusnetworks.com>
Ticket: CM-12526
Reviewed By: CCR-5119
Testing Done: Manual, bgp-min
lib/zebra.h has FILTER_X #define's. These do not belong there.
Put them in lib/filter.h where they belong.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
(cherry picked from commit 0490729cc033a3483fc6b0ed45085ee249cac779)