This addresses deletion of ES interfaces that are were not completely
configured.
Ticket: #2668488
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
When PTM sends a "cbl status" message it specifies the interface name
but not the VRF name. It is fine for VRF-lite, but doesn't work for
netns because it's possible to have multiple interfaces with the same
name. Be more restrictive in this case and return an error instead of
randomly using of the interface with the specified name.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
With netns VRF backend, we may have multiple interfaces with the same
name. Currently, the function output is not deterministic in this case,
it returns the first interface that it finds in the list. Be more
explicit and tell the user that we need the VRF name.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
```
exit1-debian-9# show ip route 172.16.16.1/32
Routing entry for 172.16.16.1/32
Known via "bgp", distance 20, metric 0, best
Last update 00:00:28 ago
* 192.168.0.2, via eth1, weight 1
AS-Path : 65003
Communities : first 65001:2 65001:3
Large-Communities: 65001:1:1 65001:1:2 65001:1:3
Selection reason : First path received
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Currently, the ll_type is set only in `netlink_interface` which is
executed only during startup. If the interface is created when the FRR
is already running, the type is not stored.
Fixes#1164.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When a client sends to zebra that GR mode is being turned
on. The client also passes down the time zebra should hold
onto the routes. Display this time with the output
of the `show zebra client` command as well.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When issuing the `show zebra client` command data about
Graceful Restart state is being printed 2 times.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In startup, zebra would dump interface information from Kernel in 3
steps w/o lock: step1, get interface information; step2, get interface
ipv4 address; step3, get interface ipv6 address.
If any interface gets added after step1, but before step2/3, zebra
would get extra interface addresses in step2/3 that has not been added
into zebra in step1. Returning error in the referenced interface lookup
would cause the startup interface retrieval to be incomplete.
Signed-off-by: Yuan Yuan <yyuanam@amazon.com>
FRR should only ever use the appropriate THREAD_ON/THREAD_OFF
semantics. This is espacially true for the functions we
end up calling the thread for.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
There's a helper function to check whether the interface is loopback or
VRF - if_is_loopback_or_vrf. Let's use it whenever we need to check that.
There's no functional change in this commit.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Pass down the safi for when we need address
resolution. At this point in time we are
hard coding the safi to SAFI_UNICAST.
Future commits will take advantage of this.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
PIM is going to need to be able to send down the address it is
trying to resolve in the multicast rib. We need a way to signal
this to the end developer. Start the conversion by adding the
ability to have a safi. But only allow SAFI_UNICAST at the moment.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The entirety of the import checking no longer needs to be
in zebra as that no-one is calling it. Remove the code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
These are no longer really needed. The client just needs
to call nexthop resolution instead.
So let's remove the zapi types.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There were two identical blocks of code run at init time that
requested info about AF_BRIDGE - don't see any reason to do that
twice, so remove one block.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
Because vrf backend may be based on namespaces, each vrf can
use in the [16-(2^32-1)] range table identifier for daemons that
request it. Extend the table manager to be hosted by vrf.
That possibility is disabled in the case the vrf backend is vrflite.
In that case, all vrf context use the same table manager instance.
Add a configuration command to be able to configure the wished
range of tables to use. This is a solution that permits to give
chunks to bgp daemon when it works with bgp flowspec entries and
wants to use specific iptables that do not override vrf tables.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When using bgp evpn rt5 setup, after BGP configuration has been
loaded, if the user attempts to detach and reattach the bridged
vxlan interface from the bridge, then BGP loses its BGP EVPN
contexts, and a refresh of BGP configuration is necessary to
maintain consistency between linux configuration and BGP EVPN
contexts (RIB). The following command can lead to inconsistency:
ip netns exec cust1 ip link set dev vxlan1000 nomaster
ip netns exec cust1 ip link set dev vxlan1000 master br1000
consecutive to the, BGP l2vpn evpn RIB is empty, and the way to
solve this until now is to reconfigure EVPN like this:
vrf cust1
no vni 1000
vni 1000
exit-vrf
Actually, the link information is correctly handled. In fact,
at the time of link event, the lower link status of the bridge
interface was not yet up, thus preventing from establishing
BGP EVPN contexts. In fact, when a bridge interface does not
have any slave interface, the link status of the bridge interface
is down. That change of status comes a bit after, and is not
detected by slave interfaces, as this event is not intercepted.
This commit intercepts the bridge link up event, and triggers
a check on slaved vxlan interfaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when running bgp evpn rt5 setup, the Rmac sent in BGP updates
stands for the MAC address of the bridge interface. After
having loaded frr configuration, the Rmac address is not refreshed.
This issue can be easily reproduced by executing some commands:
ip netns exec cust1 ip link set dev br1000 address 2e🆎45:aa:bb:cc
Actually, the BGP EVPN contexts are kept unchanged.
That commit proposes to fix this by intercepting the mac address
change, and refreshing the vxlan interfaces attached to te bridge
interface that changed its MAC address.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
We should not be using `case default` with an enumerated type
This prevents the developer of new cases from knowing where
they need to fix by just compiling.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Move the handler for incoming interface address events
to a neutral source file - it's not netlink-specific and
shouldn't have been in a netlink file.
Signed-off-by: Mark Stapp <mjs.ietf@gmail.com>
Read incoming interface address change notifications in the
dplane pthread; enqueue the events to the main pthread
for processing. This is netlink-only for now - the bsd
kernel socket path remains unchanged.
Signed-off-by: Mark Stapp <mjs.ietf@gmail.com>
Add new apis for dplane interface address handling, based on
the existing api. The existing api is basically split in two:
the first part processes an incoming netlink message in the
dplane pthread, creating a dplane context with info about
the event. The second part runs in the main pthread and uses
the context data to update an interface or connected object.
Signed-off-by: Mark Stapp <mjs.ietf@gmail.com>
Add a new netlink socket for events coming in from the host OS
to the dataplane system for processing. Rename the existing
outbound dplane socket.
Signed-off-by: Mark Stapp <mjs.ietf@gmail.com>
Description: Currently IPv4 routes with IPv6 link local next hops are
not properly installed in FPM.
Reason is the netlink decoding truncates the ipv6 LL address to 4 byte
ipv4 address.
Ex : fe80:: is directly converted to ipv4 and it results in 254.128.0.0
as next hop for below routes
show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
B>* 2.1.0.0/16 [200/0] via fe80::268a:7ff:fed0:d40, Ethernet0, weight 1,
02:22:26
B>* 5.1.0.0/16 [200/0] via fe80::268a:7ff:fed0:d40, Ethernet0, weight 1,
02:22:26
B>* 10.1.0.2/32 [200/0] via fe80::268a:7ff:fed0:d40, Ethernet0, weight
1, 02:22:26
Hence this fix converts the ipv6-LL address to ipv4-LL (169.254.0.1)
address before sending it to FPM. This is inline with how these types of
routes are currently programmed into kernel.
Signed-off-by: Nikhil Kelapure <nikhil.kelapure@broadcom.com>
Current implementation doesn't copy nexthop_srv6. This causes unexpected
behavior when receiving SID information and nexthop isn't onlink.t
Signed-off-by: Ryoga Saito <contact@proelbtn.com>
Problem:
When IP1:M1 (local) moved to IP1:M2 (remote-VTEP) bgpd continues to
advertise IP1:M1.
Fix:
Local path del is sent to bgp if the neigh was {local-active||peer-active}.
So path del needs to be called before the sync flags (including peer-active)
are cleared.
Ticket: #2706744
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
When we hand set the router-id, but we have choosen a router-id
that is already the `winner` there is no point in updating anyone
with this data.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
At startup there exists a time frame where we might not know
a particular vrf's router id. When zebra gets a request for
it let's not just blindly send whatever we have. Let's be
a bit smart and only respond with one if we have one.
The upper level protocol can wait for it to have one.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
vrf_name_to_id() returned VRF_DEFAULT when the vrf name was
unknown, hiding errors. Per community recommendation, vrf_name_to_id()
is now removed and the few callers now use vrf_lookup_by_name()
directly.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
When running bgp evpn rt5 setup with vrf namespace backend, once the
BGP configuration loaded, some refresh like the config change of a
vxlan interface is not taken into account. As consequence, the BGP
l2vpn evpn entries are empty. This can happen by recreating vxlan
interface like follows:
ip netns exec cust1 ip li del vxlan1000
ip link add vxlan1000 type vxlan id 1000 dev loopback0 local 10.209.36.1 learning
ip link set dev vxlan1000 mtu 9000
ip link set dev vxlan1000 netns cust1
ip netns exec cust1 bash
ip link set dev vxlan1000 up
ip link set dev vxlan1000 master br1000
Actually, changing learning attribute requires recreation, and this
change needs to manually reload the frr configuration.
The update mechanism in zebra about vxlan interface updates is
already put in place, but it does not work well with namespace
based vrf backend. The function zl3vni_from_svi() is then
modified to parse all the interfaces of each namespace.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Description:
Change is intended for fixing the following issues related to vrf route leaking:
Routes with special nexthops i.e. blackhole/sink routes when imported,
are not programmed into the FIB and corresponding nexthop is set as 'inactive',
nexthop interface as 'unknown'.
While importing/leaking routes between VRFs, in case of special nexthop(ipv4/ipv6)
once bgp announces route(s) to zebra, nexthop type is incorrectly set as
NEXTHOP_TYPE_IPV6_IFINDEX/NEXTHOP_TYPE_IFINDEX
i.e. directly connected even though we are not able to resolve through an interface.
This leads to nexthop_active_check marking nexthop !NEXTHOP_FLAG_ACTIVE.
Unable to find the active nexthop(s), route is not programmed into the FIB.
Whenever BGP leaks routes, set the correct nexthop type, so that route gets resolved
and correctly programmed into the FIB, in the imported vrf.
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
Insist on the fact that zclient neighbor state flags are
mapped over netlink state flags. List all the defines
currently known on kernel, and create a netlink API to
convert netlink values to zclient values. The function is
simplified as it is a 1-1 match.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
As NHRP expects some notification of neighboring entries on GRE
interface, when a new interface notification is encountered, the
exact neighbor state flag is found. Previously, the flag passed
to the upper layer was forced to NDM_STATE which is REACHABLE,
as can be seen on below trace:
2021/08/25 10:58:39 NHRP: [QQ0NK-1H449] Netlink: new-neigh 102.1.1.1 dev gre1 lladdr 10.125.0.2 nud 0x2 cache used 1 type 5
When passing the real value, NHRP received an other value like STALE.
2021/08/25 11:28:44 NHRP: [QQ0NK-1H449] Netlink: new-neigh 102.1.1.1 dev gre1 lladdr 10.125.0.2 nud 0x4 cache used 0 type 5
This flag is important for NHRP, as it permits to monitor the link
layer of NHRP entries.
Fixes: d603c0774e ("nhrp, zebra, lib: enforce usage of zapi_neigh_ip structure")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
"[no] netns NAME" commands are part of the lib, but they are actually
zebra-only:
- they are using vrf_netns_handler_create and its description clearly
says that it "should be called from zebra only"
- vtysh sends these commands only to zebra
- only zebra outputs the netns related config
- zebra notifies other daemons about netns attachment
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
There is a possibility that the same line can be matched as a command in
some node and its parent node. In this case, when reading the config,
this line is always executed as a command of the child node.
For example, with the following config:
```
router ospf
network 193.168.0.0/16 area 0
!
mpls ldp
discovery hello interval 111
!
```
Line `mpls ldp` is processed as command `mpls ldp-sync` inside the
`router ospf` node. This leads to a complete loss of `mpls ldp` node
configuration.
To eliminate this issue and all possible similar issues, let's print an
explicit "exit" at the end of every node config.
This commit also changes indentation for a couple of existing exit
commands so that all existing commands are on the same level as their
corresponding node-entering commands.
Fixes#9206.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
For some reason commit #ef524230a6baa decided
to remove enums and switch to uint16_t. Which
is not the right thing to do. Put it back
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Maybe with empty nexthop to call zebra_mpls_transit_lsp():
"no mpls lsp (16-1048575)".
So just remove this "gate_str" check. If without "gate" in command, "gtype" is
set to NEXTHOP_TYPE_BLACKHOLE for subsequent processing.
Signed-off-by: anlan_cs <anlan_cs@tom.com>
When NHRP registers to zebra to receive link layer events related to
gre interfaces, then it is interested in receiving also RTM_GETNEIGH
messages.
Fixes ("b3b751046495") nhrpd: link layer registration to notifications
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In zebra_evpn_proc_remote_nh if we do not pass in a long
enough stream, the stream reads will fail. Ensure that
we have enough data.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Handle TYPE_IFINDEX nexthops more consistently in a few places;
be more specific about a few integer return values that were
being treated as booleans.
Signed-off-by: Mark Stapp <mjs.ietf@gmail.com>
When calling rib_add_multipath_nhe ensure that we have
well aligned return codes that mean something so that
interersted parties can properly handle the situation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When receiving a route via zapi, if the route is rejected
there exists a code path where we would not free the corresponding
re created.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The command `debug zebra kernel msgdump is netlink specific.
There is no point at all to allow this to be configed on non
netlink platforms.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There were a bunch of places where we converted the
route node to a prefix string via srcdest_rnode2str when
we should have been using %pRN in zebra_rib.c. Just
convert over the ones we should to use it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When we are calling rib_process and the route_node
in question has no dest, there is no work to do here
at all. As such we should just return before
attempting to do any other work. This is just a tiny bit
of simplification being done.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There exists a call path where the nhlfe_alloc can return NULL
for blackhole nexthops. In this case we were still trying
to save the nhlfe pointer causing a crash when we attempted
to add it to a self-contained list.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Do not use the `default` case when switching over an enumerated
type. This allows the code to fail to compile when we add a
new enumeration. Thus allowing us developers to know all
the places in the code we'll need to touch.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
1. This check is absolutely useless. Nothing keeps user from deleting
the address right after this check.
2. This check prevents zebra from correctly reading the user config with
"set src" because of a race with interface startup (see #4249).
3. NO OPERATIONAL DATA USAGE ON VALIDATION STAGE.
Fixes#7319.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
v4 and v6 host/refernce prefixes need to be setup separately for
[RMAC, VTEP] entries as the VTEP is always normalized to a v4 addr.
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
The only difference in daemons' interface node definition is the config
write function. No need to define the node in every daemon, just pass
the callback as an argument to a library function and define the node
there.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
There exists some rare situations where fpm will attempt
to send a route update with no valid nexthops. In that
case an assert would be hit. This is not good for
trying to keep your routing daemons up and running
when we can safely just recover the situation.
Fixes#7588
Signed-off-by: batmancn <batmanustc@gmail.com>
<fixed commit message, and used zlog_err>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently 'show evpn rmac vni .. mac .. json' includes fields for
localSequence and remoteSequence, which are misleading since they
aren't applicable to a macs in the IP-VRF mac table (RMAC).
This removes the localSequence + remoteSequence fields from the output.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
like the other automake variables, setting `xyz_LDFLAGS` causes
`AM_LDFLAGS` to be ignored for `xyz`. For some reason I had in my mind
that automake doesn't do this for LDFLAGS, but... it does. (Which is
consistent with `_CFLAGS` and co.)
So, all the libraries and modules have been ignoring `AM_LDFLAGS` (which
includes `SAN_FLAGS` too). Set up new `LIB_LDFLAGS` and
`MODULE_LDFLAGS` to handle all of this correctly (and move these bits to
a central location.)
Fixes: #9034
Fixes: 0c4285d77e ("build: properly split CFLAGS from AC_CFLAGS")
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Notice when a ip address on a bsd interface is considered
an alias, let's mark the connected prefix we generate as
a SECONDARY.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When port was removed from last access vlan, the linux kernel
won't send any vlan info in the netlink message, it might affact
the evpn mh not withdraw EAD-EVI routes.
Signed-off-by: Gord Chen <gord_chen@edge-core.com>
Current code was allowing redistribution of kernel routes from
the non-default non vrf tables once FRR was already up and running.
In the case where we add `redistribute kernel` in an upper level
protocol we never consider the non-default vrf or non-vrf tables
so it is never accepted.
In the case where a kernel route is added after `redistribute kernel`
is already in place we were never looking at the fact that the
route was in a non-default non-vrf table. This code fixes
that issue.
Fixes: #9073
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Move remote VTEP updates from immediate, inline processing
in their ZAPI message handlers to the main workqueue.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Enqueue incoming vxlan remote macip updates on the main
workqueue, instead of performing the updates immediately,
in-line.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add workqueue subqueue for EVPN/VxLAN updates; migrate the
evpn route and remote ES processing from their ZAPI handlers
to the workqueue.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
At some point we broke the ifp pointer for nhe->ifp such
that it was pointing to an interface even in groups/recurisve
instances.
Add checks here to make it again so that we only set the ifp
pointer if it is a fully resolved singleton NHE.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
In the reachability code we auto pass back the fully resolved
nexthops. Modify the ZEBRA_IPV4_NEXTHOP_LOOKUP_MRIB code
to do the exact same thing so that the zclient_lookup_nexthop
code does not need to recursively look for the data that
zebra already has.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>