Allow bgp to set a local Administrative distance to use
for installing routes into the rib.
Example:
!
router bgp 9323
bgp router-id 1.2.3.4
neighbor enp0s8 interface remote-as external
!
address-family ipv4 unicast
neighbor enp0s8 route-map DISTANCE in
exit-address-family
!
route-map DISTANCE permit 10
set distance 153
!
line vty
!
end
eva# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
B 0.0.0.0/0 [153/0] via fe80::a00:27ff:fe84:c2d6, enp0s8, 00:00:06
K>* 0.0.0.0/0 [0/100] via 10.0.2.2, enp0s3, 00:06:31
B>* 1.1.1.1/32 [153/0] via fe80::a00:27ff:fe84:c2d6, enp0s8, 00:00:06
B>* 1.1.1.2/32 [153/0] via fe80::a00:27ff:fe84:c2d6, enp0s8, 00:00:06
B>* 1.1.1.3/32 [153/0] via fe80::a00:27ff:fe84:c2d6, enp0s8, 00:00:06
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:06:31
K>* 169.254.0.0/16 [0/1000] is directly connected, enp0s3, 00:06:31
eva#
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This change addresses the following:
1) Ensures logs under DEBUG macro checks are categorized
as zlog_debug instead of zlog_info.
2) Error logs are categorized as zlog_err instead of zlog_info.
3) Rephrasing certain logs to make them appear more intuitive.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
Problem reported that when vrf route-leaking between an unnumbered
peer in one vrf to a numbered peer in another vrf, the nexthop
attribute was missing from the update, causing the session to fail.
determined that we needed to expand the mechanism for verifying if
the route has been learned in the other vrf without an ipv4 nexthop.
Ticket: CM-25610
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
bgp update messages were not correctly calculating the size
for a labeled-unicast prefix, as they were not accounting
for the label. If the update message was large enough to
overflow the maximum packet size (4096 bytes) this could
cause bgpd to send a malformed update packet.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Modify the code such that we can auto turn the iana values of afi
and safi to pleasant to read strings.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_attr_extcom_tunnel_type does not properly
compile with warnings turned on due to recent change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This diff contains 2 parts:
1. Extract the tunnel type info from bgp extended communities.
2. Make rfapi use this common tunnel type ap
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
This is causing interop issues with vendors. According to the RFC,
receiver should ignore the NEXT_HOP attribute with MP_REACH_NLRI
present.
Signed-off-by: nikos <ntriantafillis@gmail.com>
This is causing interop issues with vendors. According to the RFC,
receiver should ignore the NEXT_HOP attribute with MP_REACH_NLRI
present.
Signed-off-by: nikos ntriantafillis@gmail.com
When using remove-private-AS together with local-as
aspath_remove_private_asns() is called before bgp_packet_attribute().
In this case, private AS will always appear in front of change_local_as.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
We have the same warn message in 3 spots, which makes it extremely
hard to figure out which of the 3 has gone terribly wrong.
Add a bit of code to disambiguate the 3 situations.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Any evpn bgp update message comes with router mac extended
community, which can potentially contain the madd adddress
same as any of the local SVIs (L3VNI) MAC address.
Set route mac exist and during route processing in
bgp_update() filter the route.
Ticket:CM-23674
Reviewed By:CCR-8336
Testing Done:
Configure L3vni mac on TORS1 which is similar to TORC11
L3vni MAC. When TORC11 received the EVPN update with
Router mac extended community, this check rejected the
BGP update message.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Consider the following topo VTEP1->SPINE1->VTEP2. ebgp is being used
for evpn route exchange with SPINE just acting as a pass through.
1. VTEP1 was building the type-3 IMET route with the correct PMSI
tunnel type (ingress-replication) and label (VNI)
2. Spine1 was however only parsing the tunnel-type in the attr (was
skipping parsing of the label field altogether) -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@MSP1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
Advertised to non peer-group peers:
TORC11(downlink-1) TORC12(downlink-2) TORC21(downlink-3) TORC22(downlink-4) TORS1(downlink-5) TORS2(downlink-6)
Route [3]:[0]:[32]:[27.0.0.15]
5550
27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
Origin IGP, valid, external, bestpath-from-AS 5550, best
Extended Community: RT:5550:1003 ET:8
AddPath ID: RX 0, TX 227
Last update: Thu Feb 7 15:44:22 2019
PMSI Tunnel Type: Ingress Replication, label: 16777213 >>>>>>>
Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@MSP1:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3. So VTEP2 didn't rx the correct label.
In an all FRR setup this doesn't have any functional consequence but some
vendors are validating the content of the label field as well and ignoring
the IMET route from FRR (say VTEP1 is FRR and VTEP2 is 3rd-party). The
functional consequence of this VTEP2 ignores VTEP1's IMET route and doesn't
add VTEP1 to the corresponding l2-vni flood list.
This commit fixes up the PMSI attr parsing on spine-1 -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@MSP1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
Advertised to non peer-group peers:
TORC11(downlink-1) TORC12(downlink-2) TORC21(downlink-3) TORC22(downlink-4) TORS1(downlink-5) TORS2(downlink-6)
Route [3]:[0]:[32]:[27.0.0.15]
5550
27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
Origin IGP, valid, external, bestpath-from-AS 5550, best
Extended Community: RT:5550:1003 ET:8
AddPath ID: RX 0, TX 278
Last update: Thu Feb 7 00:17:40 2019
PMSI Tunnel Type: Ingress Replication, label: 1003 >>>>>>>>>>>
Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@MSP1:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Ticket: CM-23790
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Currently we are hardcoding it at the time of attr building to
ingress-replication. This is just a code clean-up and has no
functional impact.
Ticket: CM-23790
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The `struct bgp_route_evpn` and `struct overlay_index` data
structures are exactly the same. Reduce to 1.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
community_free, lcommunity_free and ecommunity_free are similar type of functions. Most of the places, these three are called together. The signature of community_free is different from other two functions. Modified the community_free API signature to align with other two functions to avoid any confusion. There is no functionality impact with this and this is just to avoid any confusion.
Testing: manual testing and show commands
Signed-off-by: Sri Mohana Singamsetty msingamsetty@vmware.com
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add the ability to aggregate routes to handle
extended communities. Make the actions similiar
to what we do for normal communities.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The peer->nexthop.ifp pointer must be set when parsing the
attributes in bgp_mp_reach_parse, notice this
and fail gracefully.
Rework bgp_nexthop_set to remove the HAVE_CUMULUS and to
fail the nexthop_set when we have a zebra connection and
no ifp pointer, as that not havinga zebra connection and
no ifp pointer is legal.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
EVPN ND ext community support NA flag R-bit, to have proxy ND.
Set R-bit in EVPN NA if a given router is default gateway or there is a
local
router attached, which can be determine based on local neighbor entry.
Implement BGP ext community attribute to generate and parse R-bit and
pass along zebra to program neigh entry in kernel.
Upon receiving MAC/IP update with community type 0x06 and sub_type 0x08,
pass the R-bit to zebra to program neigh entry.
Set NTF_ROUTER in neigh entry and inform kernel to do proxy NA for EVPN.
Ref:
https://tools.ietf.org/html/draft-ietf-bess-evpn-na-flags-01
Ticket:CM-21712, CM-21711
Reviewed By:
Testing Done:
Configure Local vni enabled L3 Gateway, which would act as router,
checked
show evpn arp-cache vni x ip <ip of svi> on originated and remote VTEPs.
"Router" flag is set.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
On the case where an mp_unreach attribute is received, while there is no
mp_reach attribute too, it is not necessary to check for missing
attributes.
Fixes: 67495ddb2e ("bgpd: Fixes for recent well-known-attr check patch.")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit moves the command 'bgp enforce-first-as' from global BGP
instance configuration to peer/neighbor configuration, which can now be
changed by executing '[no] neighbor <neighbor> enforce-first-as'.
End users can now enforce sane first-AS checking on regular sessions
while e.g. disabling the checks on routeserver sessions, which usually
strip away their own AS number from the path.
To ensure backwards-compatibility, a migration routine was added which
automatically sets the 'enforce-first-as' flag on all configured
neighbors if the old global setting was activated. The old global
command immediately disappears after running the migration routine once.
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
Handle multiple PREFIX_SID's at the same time. The draft clearly
states that multiple should be handled and we have a actual pcap
file that clearly has multiple PREFIX_SID's at the same time.
Fixes: #2153
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The vrf 2 vrf route leaking auto-derives RD and RT and
installs the routes into the appropriate vpn table.
These routes when a operator configured ipv[4|6] vpn
neighbors were showing up off box. The RD and RT
values choosen are localy significant but globaly
useless and may cause confusion.
Put a special bit of code in to notice that we
should not be advertising these routes off box.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Connected routes redistributed into BGP as well as IPv4 routes with IPv6
link-local next hops (RFC 5549) need information about the associated
interface in BGP if they are candidates to be leaked into another VRF. In
the absence of route leaking, this was not necessary. Introduce the
appropriate mechanism and ensure this is used during route install (in
the target VRF).
Ticket: CM-20343, CM-20382
Testing done:
1. Manually verified failed scenarios and some additional ones - logs
in the tickets.
2. Ran bgp-min and evpn-min - results are good.
3. Ran vrf smoke - has some failures, but none which look new
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
FS UNREACH message with 0 NLRI inside is sent after each peer
establishment. FS can send NLRI messages with no nexthop.
The commit fixes a message that is triggered by mistake
if FS was about to be sent, then that message is not output.
Also it fixes a typo.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This work is derived from a work done by China-Telecom.
That initial work can be found in [0].
As the gap between frr and quagga is important, a reworks has been
done in the meantime.
The initial work consists of bringing the following:
- Bringing the client side of flowspec.
- the enhancement of address-family ipv4/ipv6 flowspec
- partial data path handling at reception has been prepared
- the support for ipv4 flowspec or ipv6 flowspec in BGP open messages,
and the internals of BGP has been done.
- the memory contexts necessary for flowspec has been provisioned
In addition to this work, the following has been done:
- the complement of adaptation for FS safi in bgp code
- the code checkstyle has been reworked so as to match frr checkstyle
- the processing of IPv6 FS NLRI is prevented
- the processing of FS NLRI is stopped ( temporary)
[0] https://github.com/chinatelecom-sdn-group/quagga_flowspec/
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: jaydom <chinatelecom-sdn-group@github.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Received PMSI tunnel attributes (in EVPN type-3 route) were not recognized.
Parse them and display the tunnel type when looking at routes. Note that
the only tunnel type currently supported is ingress replication (IR). A
warning message will be logged if the received tunnel type is something
else, but the attribute is otherwise ignored.
Updates: a21bd7a (bgpd: add PMSI_TUNNEL_ATTRIBUTE to EVPN IMET routes)
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
When doing symmetric routing,
EVPN type-2 (MACIP) routes need to be advertised with two labels (VNIs)
the first being the L2 VNI (identifying the VLAN) and
the second being the L3 VNI (identifying the VRF).
The receive processing needs to handle one or two labels too.
Ticket: CM-18489
Review: CCR-6949
Testing: manual and bgp/evpn/mpls smoke
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
1. Added default gw extended community
2. code modification to handle sticky-mac/default-gw-mac as they go together
3. show command support for newly added extended community
4. State in zebra to reflect if a mac/neigh is default gateway
5. show command enhancement to refelect the same in zebra commands
Ticket: CM-17428
Review: CCR-6580
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Some of the deprecated stream.h macros see such little use that we may
as well just remove them and use the non-deprecated macros.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Move and modify all network input related code to bgp_io.c
* Add a real input buffer to `struct peer`
* Move connection initialization to its own thread.c task instead of
piggybacking off of bgp_read()
* Tons of little fixups
Primary changes are in bgp_packet.[ch], bgp_io.[ch], bgp_fsm.[ch].
Changes made elsewhere are almost exclusively refactoring peer->ibuf to
peer->curr since peer->ibuf is now the true FIFO packet input buffer
while peer->curr represents the packet currently being processed by the
main pthread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
So we have the ability to apply speculative route-maps to
neighbor display to see what the changes would look like
via some show commands. When we do this we make a
shallow copy of the attr data structure and then pass
it around for applying the routemap. After we've applied
this route-map and displayed it we really need to clean
up memory that the route-map application applied.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
A crafted BGP UPDATE with a malformed path attribute length field causes
bgpd to dump up to 65535 bytes of application memory and send it as the
data field in a BGP NOTIFY message, which is truncated to 4075 bytes
after accounting for protocol headers. After reading a malformed length
field, a NOTIFY is generated that is supposed to contain the problematic
data, but the malformed length field is inadvertently used to compute
how much data we send.
CVE-2017-15865
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This issue was discovered on a live session with an extremely
old cisco 7206VXR router running 12.2(33)SRE4. The sending router
is sending us an empty NLRI that is MP_REACH. From RFC
exploration(thanks Russ!) it appears that this was
considered a 'valid' way to send EOR.
Following discussion decided that we should treat
this situation as a EOR marker instead of bringing
down the session.
Applying this fix on the FRR router seeing this issue
allows it to continue it's peering relationship with
the ASR. Since this is a point fix I do not see
a high likelihood of further fallout.
Fixes: #1258
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Commit 8c9cc7bbf6 changed the size
of the `struct bgp_attr_encap_subtlv` type to be a zero length
array at the end instead of having a 1 byte. All memory allocations
for this subsuquently were off by 1 byte since those were not
adjusted either.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
stlv_last is initialized with the loops. No need to reset it.
Its scope is local to the use with the loops.
Signed-off-by: Vincent Jardin <vincent.jardin@6wind.com>
These are now unused. route-maps can't modify these attributes, so
there is no need for _dup functions.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
bgp_attr_deep_dup is based on a misunderstanding of how route-maps work.
They never change actual data, just pointers & fields in "struct attr".
The correct thing to do is copy struct attr and call bgp_attr_flush()
afterwards.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This attempt at optimization has cost us more than a week's worth of
time on several people hunting down the subtle bug that it was missing
an increment on attr->lcommunity.
This is absolutely not worth the maintenance cost.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
1) Add hash names to all hash_create calls
2) Fix community_hash, ecommunity_hash and lcommunity_hash key
creation
3) Fix output of community and lcommunity iterators( why would
we want to see the memory location of the backet? ).
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
A prior change broke the nexthop setting for labeled-unicast
address-family in a RFC-5549 scenario (IPv4 prefixes exchanged
with IPv6 next hops). This commit fixes the issue.
Fixes: "bgpd: Fix next hop setting for EVPN"
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Commit c8e7b895 ("bgpd: use Jenkins hash for BGP transit, cluster and
attr hashes") changed attrhash_key_make() to use Jenkins hash, commit
c8f3fe30 ("bgpd: Remove AS Path limit/TTL functionality") introduced
a bogus change with a snippet of code that was deleted in the first
one.
Signed-off-by: Jorge Boncompte <jbonor@gmail.com>
This reverts commit c14777c6bf.
clang 5 is not widely available enough for people to indent with. This
is particularly problematic when rebasing/adjusting branches.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Most of the attributes in 'struct attr_extra' allow for
the more interesting cases of using bgp. The extra
overhead of managing it will induce errors as we add
more attributes and the extra memory overhead is
negligible on anything but full bgp feeds.
Additionally this greatly simplifies the code for
the handling of data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
bgpd: Fix missing label set
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Implement support for sticky (static) MACs. This includes the following:
- Recognize MAC is static (using NUD_NOARP flag) and inform BGP
- Construct MAC mobility extended community for sticky MACs as per
RFC 7432 section 15.2
- Inform to zebra that remote MAC is sticky, where appropriate
- Install sticky MACs into the kernel with the right flag
- Appropriate handling in route selection
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Core EVPN route handling functionality. This includes support for the
following:
- interface with zebra to learn about local VNIs and MACIPs as well as
to install remote VTEPs (per VNI) and remote MACIPs
- create/update/delete EVPN type-2 and type-3 routes
- attribute creation, route selection and install
- route handling per VNI and for the global routing table
- parsing of received EVPN routes and handling by route type
- encoding attributes for EVPN routes and EVPN prefix creation (for
Updates)
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
The next hop for EVPN routes must be an IPv4 or IPv6 address as per
RFC 7432. Ensure this is correctly handled. Also, ensure there
are correct checks for AFI_L2VPN and nexthop AFI is not AFI_L2VPN.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
log.c provides functionality for associating a constant (typically a
protocol constant) with a string and finding the string given the
constant. However this is highly delicate code that is extremely prone
to stack overflows and off-by-one's due to requiring the developer to
always remember to update the array size constant and to do so correctly
which, as shown by example, is never a good idea.b
The original goal of this code was to try to implement lookups in O(1)
time without a linear search through the message array. Since this code
is used 99% of the time for debugs, it's worth the 5-6 additional cmp's
worst case if it means we avoid explitable bugs due to oversights...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
- All ipv4 labeled-unicast routes are now installed in the ipv4 unicast
table. This allows us to do things like take routes from an ipv4
unicast peer, allocate a label for them and TX them to a ipv4
labeled-unicast peer. We can do the opposite where we take routes from
a labeled-unicast peer, remove the label and advertise them to an ipv4
unicast peer.
- Multipath over a labeled route and non-labeled route is not allowed.
- You cannot activate a peer for both 'ipv4 unicast' and 'ipv4
labeled-unicast'
- The 'tag' variable was overloaded for zebra's route tag feature as
well as the mpls label. I added a 'mpls_label_t mpls' variable to
avoid this. This is much cleaner but resulted in touching a lot of
code.
The bpacket_reformat_for_peer() function rewrites the nexthop of outgoing
route updates on a per-peer basis in order to handle route-maps ("set
ip next-hop") and locally-originated routes missing a nexthop.
In the latter case, RFC 4271 says the following: "When announcing a
locally-originated route to an internal peer, the BGP speaker SHOULD use
the interface address of the router through which the announced network
is reachable for the speaker as the NEXT_HOP".
We were doing this for regular IPv4/IPv6 routes, but not for
VPN/EVPN/ENCAP routes, which were being announced with invalid nexthops
(0.0.0.0 or ::).
This patch fixes this problem.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Added bgp_nexthop_afi() to have one place that determines what the
Nexthop AFI is for bgp_packet_mpattr_start()
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
The initial implementation was against draft-keyupate-idr-bgp-prefix-sid-02
This updates our label-index implementation up to draft-ietf-idr-bgp-prefix-sid-05
- changed BGP_ATTR_LABEL_INDEX to BGP_ATTR_PREFIX_SID
- since there are multiple TLVs in BGP_ATTR_PREFIX_SID you can no longer
rely on that flag to know if there is a label-index for the path. I
changed bgp_attr_extra_new() to init the label_index to
BGP_INVALID_LABEL_INDEX
- put some placeholder code in for the other two TLVs (IPv6 and
Originator SRGB)
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
- cleaned up the "show bgp ipv4 labeled-unicast x.x.x.x" output
- fixed some json keys to use camelCase
- bgp_attr_label_index() was clearing BGP_ATTR_LABEL_INDEX because it
was comparing mp_update->afi against SAFI_LABELED_UNICAST instead of
mp_update->safi
- added BGP_ATTR_LABEL_INDEX to attr_str
Labeled-unicast updates were being sent with an ipv6 nexthop due to
not setting the mp_nexthop_len or nh_afi. On the receive side, the
prefix length was being incorrectly determined and has been fixed.
Also the stream for bgp_label_buf was not created. All resolved.
Ticket: CM-15260
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by:
Implement BGP Prefix-SID IETF draft to be able to signal a labeled-unicast
prefix with a label index (segment ID). This makes it easier to deploy
global MPLS labels with BGP, even without other aspects of Segment Routing
implemented.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Implement support for negotiating IPv4 or IPv6 labeled-unicast address
family, exchanging prefixes and installing them in the routing table, as
well as interactions with Zebra for FEC registration. This is the
implementation of RFC 3107.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
To know if overlay index is the same between two route information,
then the two overlay index field is compared. If both fields are set to
null, then the comparison should be equal, then return true, which was
not done.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
As per draft-ietf-idr-tunnel-encaps-02, section 3.2.1, BGP Encap
attribute supports vxlan tunnel type. A new tunnel attribute has been
appended to subtlv list, describing the vxlan network identifier to
be used for the routing information of the BGP update message.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This patch introduces the ability to make route type 5 message
when EVPN is enabled. Picked up paramters are collected from the
bgp extra attribute structure and are the ESI, the ethernet tag
information. In addition to this, nexthop attribute is collected too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This patch appends nexthop attribute to EVPN message, in addition
to appending gateway IP in RT-5 NLRI itself. In reception, if
both informations are stored, then the GW IP information will
supersede the NLRI nexthop attribute.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The commit introduces the changes to be done to carry route type 5 EVPN
information in bgp extra attribute information. The commit also handles
the update processing for route type 5 information, including ESI,
gatewayIP and label information.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(to match surrounding code)
"git diff -w" should be almost empty.
Copyright edited to say FRR, this is not GNU Zebra :)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
BGP Large Communities are a novel way to signal information between
networks. An example of a Large Community is: "2914:65400:38016". Large
BGP Communities are composed of three 4-byte integers, separated by a
colon. This is easy to remember and accommodates advanced routing
policies in relation to 4-Byte ASNs.
This feature was developed by:
Keyur Patel <keyur@arrcus.com> (Arrcus, Inc.),
Job Snijders <job@ntt.net> (NTT Communications),
David Lamparter <equinox@opensourcerouting.org>
and Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Job Snijders <job@ntt.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Introduce internal and IANA defintions for AFI/SAFI and mapping
functions and modify code to use these. This refactoring will
facilitate adding support for other AFI/SAFI whose IANA values
won't be suitable for internal data structure definitions (e.g.,
they are not contiguous).
The commit adds some fixes related to afi/safi testing with 'make check
' command.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Ticket: CM-11416
Reviewed By: CCR-3594 (mpls branch)
Testing Done: Not tested now, tested earlier on mpls branch
* bgpd parses NLRIs twice, a first pass "sanity check" and then a second pass
that changes actual state. For most AFI/SAFIs this is done by
bgp_nlri_sanity_check and bgp_nlri_parse, which are almost identical.
As the required action on a syntactic error in an NLRI is to NOTIFY and
shut down the session, it should be acceptable to just do a one pass
parse. There is no need to atomically handle the NLRIs.
* bgp_route.h: (bgp_nlri_sanity_check) Delete
* bgp_route.c: (bgp_nlri_parse) Make the prefixlen size check more general
and don't hard-code AFI/SAFI details, e.g. use prefix_blen library function.
Add error logs consistent with bgp_nlri_sanity_check as much as possible.
Add a "defense in depth" type check of the prefixlen against the sizeof
the (struct prefix) storage - ala bgp_nlri_parse_vpn.
Update standards text from draft RFC4271 to the actual RFC4271 text.
Extend the semantic consistency test of IPv6. E.g. it should skip mcast
NLRIs for unicast safi as v4 does.
* bgp_mplsvpn.{c,h}: Delete bgp_nlri_sanity_check_vpn and make
bgp_nlri_parse_vpn_body the bgp_nlri_parse_vpn function again.
(bgp_nlri_parse_vpn) Remove the notifies. The sanity checks were
responsible for this, but bgp_update_receive handles sending NOTIFY
generically for bgp_nlri_parse.
* bgp_attr.c: (bgp_mp_reach_parse,bgp_mp_unreach_parse) Delete sanity check.
NLRI parsing done after attr parsing by bgp_update_receive.
Arising out of discussions on the need for two-pass NLRI parse with:
Lou Berger <lberger@labn.net>
Donald Sharp <sharpd@cumulusnetworks.com>
* bgp_route.h: (bgp_nlri_sanity_check) The bulk of the args are equivalent
to a (struct bgp_nlri), consolidate.
* bgp_route.c: (bgp_nlri_sanity_check) Make this a frontend for all afi/safis.
Including SAFI_MPLS_LABELED_VPN.
(bgp_nlri_sanity_check_ip) Regular IP NLRI sanity check based on the
existing code, and adjusted for (struct bgp_nlri *) arg.
* bgp_attr.c: (bgp_mp_reach_parse) Adjust for passing (struct bgp_nlri *)
to bgp_nlri_sanity_check.
Get rid of special-casing to not sanity check VPN.
(bgp_mp_unreach_parse) Ditto.
* bgp_mplsvpn.c: Use the same VPN parsing code for both the sanity
check and the actual parse.
(bgp_nlri_parse_vpn) renamed to bgp_nlri_parse_vpn_body and made
internal.
(bgp_nlri_parse_vpn_body) Added (bool) argument to control whether it
is sanity checking or whether it should update routing state for each
NLRI. Send a NOTIFY and reset the session, if there's a parsing
error, as bgp_nlri_sanity_check_ip does, and as is required by the
RFC.
(bgp_nlri_parse_vpn) now a wrapper to call _body with update.
(bgp_nlri_sanity_check_vpn) wrapper to call parser without
updating.
* bgp_mplsvpn.h: (bgp_nlri_sanity_check_vpn) export for
bgp_nlri_sanity_check.
* bgp_packet.c: (bgp_update_receive) Adjust for bgp_nlri_sanity_check
argument changes.
* test/bgp_mp_attr_test.c: Extend to also test the NLRI parsing functions,
if the initial MP-attr parsing has succeeded. Fix the NLRI in the
VPN cases. Add further VPN tests.
* tests/bgpd.tests/testbgpmpattr.exp: Add the new test cases.
This commit a joint effort of:
Lou Berger <lberger@labn.net>
Donald Sharp <sharpd@cumulusnetworks.com>
Paul Jakma <paul.jakma@hpe.com> / <paul@jakma.org>
bgp_attr_flag_invalid can access beyond the last element of attr_flags_values.
Fix this by initializing attr_flags_values_max to the correct value.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
Description:
We use valgrind memcheck quite a bit to spot leaks in
our work with bgpd. In order to eliminate false positives,
we added code in the exit path to release the remaining
allocated memory.
Bgpd startup log message now includes pid.
Some little tweaks by Paul Jakma <paul.jakma@hpe.com>:
* bgp_mplsvpn.c: (str2prefix_rd) do the cleanup in common code at the end
and goto it.
[DL: dropped several chunks from original commit which are obsolete by
now on this tree.]
This patch is part of the previously submitted patch set on VPN and
Encap SAFIs. It fixes an issue identified by NetDEF CI.
Ensure temp stack structures are initialized Add protection against
double frees / post free access to bgp_attr_flush
Signed-off-by: Lou Berger <lberger@labn.net>
This feature adds an L3 & L2 VPN application that makes use of the VPN
and Encap SAFIs. This code is currently used to support IETF NVO3 style
operation. In NVO3 terminology it provides the Network Virtualization
Authority (NVA) and the ability to import/export IP prefixes and MAC
addresses from Network Virtualization Edges (NVEs). The code supports
per-NVE tables.
The NVE-NVA protocol used to communicate routing and Ethernet / Layer 2
(L2) forwarding information between NVAs and NVEs is referred to as the
Remote Forwarder Protocol (RFP). OpenFlow is an example RFP. For
general background on NVO3 and RFP concepts see [1]. For information on
Openflow see [2].
RFPs are integrated with BGP via the RF API contained in the new "rfapi"
BGP sub-directory. Currently, only a simple example RFP is included in
Quagga. Developers may use this example as a starting point to integrate
Quagga with an RFP of their choosing, e.g., OpenFlow. The RFAPI code
also supports the ability import/export of routing information between
VNC and customer edge routers (CEs) operating within a virtual
network. Import/export may take place between BGP views or to the
default zebera VRF.
BGP, with IP VPNs and Tunnel Encapsulation, is used to distribute VPN
information between NVAs. BGP based IP VPN support is defined in
RFC4364, BGP/MPLS IP Virtual Private Networks (VPNs), and RFC4659,
BGP-MPLS IP Virtual Private Network (VPN) Extension for IPv6 VPN . Use
of both the Encapsulation Subsequent Address Family Identifier (SAFI)
and the Tunnel Encapsulation Attribute, RFC5512, The BGP Encapsulation
Subsequent Address Family Identifier (SAFI) and the BGP Tunnel
Encapsulation Attribute, are supported. MAC address distribution does
not follow any standard BGB encoding, although it was inspired by the
early IETF EVPN concepts.
The feature is conditionally compiled and disabled by default.
Use the --enable-bgp-vnc configure option to enable.
The majority of this code was authored by G. Paul Ziemba
<paulz@labn.net>.
[1] http://tools.ietf.org/html/draft-ietf-nvo3-nve-nva-cp-req
[2] https://www.opennetworking.org/sdn-resources/technical-library
Now includes changes needed to merge with cmaster-next.
lib/zebra.h has FILTER_X #define's. These do not belong there.
Put them in lib/filter.h where they belong.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
(cherry picked from commit 0490729cc033a3483fc6b0ed45085ee249cac779)
VPNv6 changes picked from upstream needed fixes and updates due to some
fundamental changes implemented by Cumulus (BGP update-groups, RFC 5549
and nexthop setting etc.) which aren't present upstream.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Updates: 945c8fe, 8ecd326, bb86c60, 93b73df, f4c8985
This fixes some minor mixups particularly in MPLS-related SAFIs, as well
as doing some stylistic changes & adding comments.
Signed-off-by: Lou Berger <lberger@labn.net>
Reviewed-by: David Lamparter <equinox@opensourcerouting.org>
(cherry picked from commit 050defe816e4bd4cac7b028f69e45cb1974ca96d)
Conflicts:
bgpd/bgp_attr.c
bgpd/bgp_attr.h
bgpd/bgp_packet.c
bgpd/bgp_route.c
bgpd/bgp_route.h
There wasn't much missing for VPNv6 to begin with; just a few bits of
de- & encoding and a few lists to be updated.
Signed-off-by: Lou Berger <lberger@labn.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
[Editorial note: Signed-off-by may imply an authorship claim, but need not]
Edited-by: Paul Jakma <paul.jakma@hpe.com> / <paul@jakma.org>
(cherry picked from commit 9da04bca0e994ec92b9242159bf27d89c6743354)
Conflicts:
bgpd/bgp_attr.c
bgpd/bgp_mplsvpn.c
bgpd/bgpd.c
* bgp_attr.c: Recent patch to tighten well-known attr checks and apply that
to all AFIs has some breakage with MP-extensions and GR, which needs to be
fixed.
(bgp_attr_check) Graceful Restart EoR can be an empty UPDATE for IPv4/uni.
MP-Ext allow UPDATE with just MP_UNREACH_NLRI. Check for these and return
proceed.
NEXT_HOP becomes optional, if MP_REACH_NLRI is present and there's no
v4 NLTI, update NEXT_HOP check accordingly.
Print the missing attr in string form in the log message.
(bgp_attr_parse) AS_PATH need not be there, so
bgp_attr_munge_as4_attrs call needs to be conditional on that.
(cherry picked from commit aed1b556cf2f55680ae09d7ad1a1f22729dea8c5)
Conflicts:
bgpd/bgp_attr.c
* ANVL testing by Martin Winter threw up a crash in bgpd in aspath_dup
called from bgp_packet_attribute, if attr->aspath was NULL, on an IPv6
UPDATE.
This root cause is that the checks for well-known, mandatory attributes
were being applied only if an UPDATE contained the IPv4 NLRI and the
peer was configured for v4/unicast (i.e. not deconfigured). This is
something inherited from GNU Zebra, and never noticed before.
* bgp_attr.c: (bgp_attr_parse) Move the well-known mandatory attribute
check to here, so that it can be run immediately after all attributes
are parsed, and before any further processing of attributes that might
assume the existence of WK/M attributes (e.g. AS4-Path).
(bgp_attr_munge_as4_attrs) Missing AS_PATH shouldn't happen here anymore,
but retain a check anyway for robustness - it's definitely a hard error
though.
* bgp_attr.h: (bgp_attr_check) No longer needs to be exported, make static.
* bgp_packet.c: (bgp_update_receive) Responsibility for well-known check
now in bgp_attr_parse.
(cherry picked from commit 055086f70febc30fdfd94bb4406e9075d6934cd8)
Conflicts:
bgpd/bgp_attr.c
bgpd/bgp_attr.h
bgpd/bgp_packet.c
Unfortunately, the attribute present bits for MP_REACH and MP_UNREACH
which 1a211cb ("bgpd: one more fix"...) tests for are never set in their
corresponding attribute parsing functions.
Reported-by: Martin Winter <mwinter@netdef.org>
Fixes: 1a211cb "bgpd: one more fix for tightening of check for missing well-known attributes"
Cc: Paul Jakma <paul@opensourcerouting.org>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
(cherry picked from commit daefeb8755e194dd19a5f1910bc78d13c8147efb)
Remove some dead code and fix initialization of the
sockunion.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com
Ticket: CM-8014
This implements addpath TX with the first feature to use it
being "neighbor x.x.x.x addpath-tx-all-paths".
One change to show output is 'show ip bgp x.x.x.x'. If no addpath-tx
features are configured for any peers then everything looks the same
as it is today in that "Advertised to" is at the top and refers to
which peers the bestpath was advertise to.
root@superm-redxp-05[quagga-stash5]# vtysh -c 'show ip bgp 1.1.1.1'
BGP routing table entry for 1.1.1.1/32
Paths: (6 available, best #6, table Default-IP-Routing-Table)
Advertised to non peer-group peers:
r1(10.0.0.1) r2(10.0.0.2) r3(10.0.0.3) r4(10.0.0.4) r5(10.0.0.5) r6(10.0.0.6) r8(10.0.0.8)
Local, (Received from a RR-client)
12.12.12.12 (metric 20) from r2(10.0.0.2) (10.0.0.2)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 8
Last update: Fri Oct 30 18:26:44 2015
[snip]
but once you enable an addpath feature we must display "Advertised to" on a path-by-path basis:
superm-redxp-05# show ip bgp 1.1.1.1/32
BGP routing table entry for 1.1.1.1/32
Paths: (6 available, best #6, table Default-IP-Routing-Table)
Local, (Received from a RR-client)
12.12.12.12 (metric 20) from r2(10.0.0.2) (10.0.0.2)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 8
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:44 2015
Local, (Received from a RR-client)
34.34.34.34 (metric 20) from r3(10.0.0.3) (10.0.0.3)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 7
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:39 2015
Local, (Received from a RR-client)
56.56.56.56 (metric 20) from r6(10.0.0.6) (10.0.0.6)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 6
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:39 2015
Local, (Received from a RR-client)
56.56.56.56 (metric 20) from r5(10.0.0.5) (10.0.0.5)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 5
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:39 2015
Local, (Received from a RR-client)
34.34.34.34 (metric 20) from r4(10.0.0.4) (10.0.0.4)
Origin IGP, metric 0, localpref 100, valid, internal
AddPath ID: RX 0, TX 4
Advertised to: r8(10.0.0.8)
Last update: Fri Oct 30 18:26:39 2015
Local, (Received from a RR-client)
12.12.12.12 (metric 20) from r1(10.0.0.1) (10.0.0.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
AddPath ID: RX 0, TX 3
Advertised to: r1(10.0.0.1) r2(10.0.0.2) r3(10.0.0.3) r4(10.0.0.4) r5(10.0.0.5) r6(10.0.0.6) r8(10.0.0.8)
Last update: Fri Oct 30 18:26:34 2015
superm-redxp-05#
"bgp confederation id X" are the same value.
router bgp 1
bgp router-id 10.1.1.1
bgp confederation identifier 1
bgp confederation peers 24 35
neighbor 10.1.1.2 remote-as 24
neighbor 10.1.1.2 update-source lo
neighbor 10.1.1.3 remote-as 1
neighbor 10.1.1.3 update-source lo
The customer does this because they want to peer to 10.1.1.2 as a
confed-external peer but peer with 10.1.1.3 as a normal iBGP peer.
The bug was that we thought 10.1.1.3 was an EBGP peer so we did not send him
LOCALPREF which caused the Juniper to send us a NOTIFICATION. I confirmed
that quagga also sends a NOTIFICATION in this scenario.
The fix is to add a check to see if router bgp X and bgp confederation
identifier X are equal because that is a factor in determining if a peer is
EBGP or IBGP
Additional issues fixed in the this patch:
We were not properly removing all AS_CONFED_SEQUENCEs/SETs from the aspath
when advertising a route to an ebgp peer. This was due to two issues:
We only called aspath_delete_confed_seq() if confederations were
configured. We can RX as aspath with CONFED segments even if
confederations are not configured.
aspath_delete_confed_seq() was implemented based on the original confed
RFC 3065 which basically said "remove all of the leading
AS_CONFED_SEQUENCEs/SETs" where the new confed RFC 5065 says "remove ALL
of the AS_CONFED_SEQUENCEs/SETs"
peer-groups did not work for confed-external peers. peer_calc_sort() always
returned BGP_PEER_EBGP for a confederations where the remote-as was not
specified. The reason was the peer->as_type was AS_UNSPECIFIED but we checked
if (peer->as_type != AS_SPECIFIED)
return (peer->as_type == AS_INTERNAL ? BGP_PEER_IBGP : BGP_PEER_EBGP);
After fixing that I found that when we got to the else where we checked for
peer1 we could only possibly return BGP_PEER_IBGP or BGP_PEER_EBGP, we need
to also be able to return BGP_PEER_CONFED. I changed this to return
peer1->sort.
"show ip bgp x.x.x.x" would always display "Local" for the aspath. This is
because we were calling aspath_counts_hop() to determine if the aspath was
empty. This is wrong though because CONFED segments do not count towards
aspath hopcount. The fix is to null check aspath->segments to determine if
the aspath is actually empty.
"show ip bgp x.x.x.x" and "show ip bgp neighbor" always displayed
"internal" or "external" and never "confed-internal" or "confed-external".
This made troubleshooting difficult because I couldn't tell exactly what
kind of peer I was dealing with. I added the confed-internal and
confed-external output...also added a "peer-type" field in the json output
for 'show ip bgp x.x.x.x'
"show ip bgp peer-group" did not list the peer-group name if we hadn't
determined the "type" (internal, external, etc) for the peer-group
This adds support for BGP RFC 5549 (Extended Next Hop Encoding capability)
* send and receive of the capability
* processing of IPv4->IPv6 next-hops
* for resolving these IPv6 next-hops, itsworks with the current
next-hop-tracking support
* added a new message type between BGP and Zebra for such route
install/uninstall
* zserv side of changes to process IPv4 prefix ->IPv6 next-hops
* required show command changes for IPv4 prefix having IPv6 next-hops
Few points to note about the implementation:
* It does an implicit next-hop-self when a [IPv4 prefix -> IPv6 LL next-hop]
is to be considered for advertisement to IPv4 peering (or IPv6 peering
without Extended next-hop capability negotiated)
* Currently feature is off by default, enable it by configuring
'neighbor <> capability extended-nexthop'
* Current support is for IPv4 Unicast prefixes only.
IMPORTANT NOTE:
This patch alone isn't enough to have IPv4->IPv6 routes installed into
the kernel. A separate patch is needed for that to work for the netlink
interface.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Vivek Venkatraman <vivek@cumulusnetworks.com>
Donald Sharp <sharpd@cumulusnetworks.com>
change processing etc.) that refer to the BGP instance, the correct BGP
instance must be referenced and not the default BGP instance. The default
BGP instance is the first instance on the instance list. In a scenario
where one BGP instance is deleted (through operator action such as a
"no router bgp" command) and another instance exists or is created, there
may still be events in-flight that need to be processed against the
deleted instance. Trying to process these against the default instance
is erroneous. The calls to bgp_get_default() must be limited to the user
interface (vtysh) context.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
This patch implements the 'update-groups' functionality in BGP. This is a
function that can significantly improve BGP performance for Update generation
and resultant network convergence. BGP Updates are formed for "groups" of
peers and then replicated and sent out to each peer rather than being formed
for each peer. Thus major BGP operations related to outbound policy
application, adj-out maintenance and actual Update packet formation
are optimized.
BGP update-groups dynamically groups peers together based on configuration
as well as run-time criteria. Thus, it is more flexible than update-formation
based on peer-groups, which relies on operator configuration.
[Note that peer-group based update formation has been introduced into BGP by
Cumulus but is currently intended only for specific releases.]
From 11098af65b2b8f9535484703e7f40330a71cbae4 Mon Sep 17 00:00:00 2001
Subject: [PATCH] updgrp commits
Summary of changes
- added an option to enable keepalive debugs for a specific peer
- added an option to enable inbound and/or outbound updates debugs for a specific peer
- added an option to enable update debugs for a specific prefix
- added an option to enable zebra debugs for a specific prefix
- combined "deb bgp", "deb bgp events" and "deb bgp fsm" into "deb bgp neighbor-events". "deb bgp neighbor-events" can be enabled for a specific peer.
- merged "deb bgp filters" into "deb bgp update"
- moved the per-peer logging to one central log file. We now have the ability to filter all verbose debugs on a per-peer and per-prefix basis so we no longer need to keep log files per-peer. This simplifies troubleshooting by keeping all BGP logs in one location. The use
r can then grep for the peer IP they are interested in if they wish to see the logs for a specific peer.
- Changed "show debugging" in isis to "show debugging isis" to be consistent with all other protocols. This was very confusing for the user because they would type "show debug" and expect to see a list of debugs enabled across all protocols.
- Removed "undebug" from the parser for BGP. Again this was to be consisten with all other protocols.
- Removed the "all" keyword from the BGP debug parser. The user can now do "no debug bgp" to disable all BGP debugs, before you had to type "no deb all bgp" which was confusing.
The new parse tree for BGP debugging is:
deb bgp as4
deb bgp as4 segment
deb bgp keepalives [A.B.C.D|WORD|X:X::X:X]
deb bgp neighbor-events [A.B.C.D|WORD|X:X::X:X]
deb bgp nht
deb bgp updates [in|out] [A.B.C.D|WORD|X:X::X:X]
deb bgp updates prefix [A.B.C.D/M|X:X::X:X/M]
deb bgp zebra
deb bgp zebra prefix [A.B.C.D/M|X:X::X:X/M]
- Schedule write thread for advertisements and withdraws only if corresponding
FIFOs are growing and/or upon work_queue getting fully processed.
- Set non-default yield time for the main work_queue, as the default value
of 10ms results in yielding after processing very few nodes.
- Remove unnecessary scheduling of write thread when update packet is formed.
- If MRAI is 0, don't start a timer unnecessarily, directly schedule write
thread.
- Some debugs.
Credit
------
A huge amount of credit for this patch goes to Piotr Chytla for
their 'route tags support' patch that was submitted to quagga-dev
in June 2007.
Documentation
-------------
All ipv4 and ipv6 static route commands now have a "tag" option
which allows the user to set a tag between 1 and 65535.
quagga(config)# ip route 1.1.1.1/32 10.1.1.1 tag ?
<1-65535> Tag value
quagga(config)# ip route 1.1.1.1/32 10.1.1.1 tag 40
quagga(config)#
quagga# show ip route 1.1.1.1/32
Routing entry for 1.1.1.1/32
Known via "static", distance 1, metric 0, tag 40, best
* 10.1.1.1, via swp1
quagga#
The route-map parser supports matching on tags and setting tags
!
route-map MATCH_TAG_18 permit 10
match tag 18
!
!
route-map SET_TAG_22 permit 10
set tag 22
!
BGP and OSPF support:
- matching on tags when redistribing routes from the RIB into BGP/OSPF.
- setting tags when redistribing routes from the RIB into BGP/OSPF.
BGP also supports setting a tag via a table-map, when installing BGP
routes into the RIB.
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
COMMAND:
Possible forms of the command configuration:
[no] bgp max-med administrative
[no] bgp max-med administrative <max-med-value>
[no] bgp max-med on-startup <period>
[no] bgp max-med on-startup <period> <max-med-value>
DESCRIPTION:
'administrative' takes effect from the time of the config until the config is
removed.
'on-startup' is effective only at the startup time for the given '<period>'
after the first peer is established.
'<max-med-value>' is used as the MED value to be sent out when the max-med
is effective. Default max-med value is 4294967294.
NOTE:
When max-med is active, MED is changed only in the outgoing attributes to the
peers, it doesn't modify any MED specific state of the attributes in BGP on
the local node.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
COMMAND:
table-map <route-map-name>
DESCRIPTION:
This feature is used to apply a route-map on route updates from BGP to Zebra.
All the applicable match operations are allowed, such as match on prefix,
next-hop, communities, etc. Set operations for this attach-point are limited
to metric and next-hop only. Any operation of this feature does not affect
BGPs internal RIB.
Supported for ipv4 and ipv6 address families. It works on multi-paths as well,
however, metric setting is based on the best-path only.
IMPLEMENTATION NOTES:
The route-map application at this point is not supposed to modify any of BGP
route's attributes (anything in bgp_info for that matter). To achieve that,
creating a copy of the bgp_attr was inevitable. Implementation tries to keep
the memory footprint low, code comments do point out the rationale behind a
few choices made.
bgp_zebra_announce() was already a big routine, adding this feature would
extend it further. Patch has created a few smaller routines/macros whereever
possible to keep the size of the routine in check without compromising on the
readability of the code/flow inside this routine.
For updating a partially filtered route (with its nexthops), BGP to Zebra
replacement semantic of the next-hops serves the purpose well. However, with
this patch there could be some redundant withdraws each time BGP announces a
route thats (all the nexthops) gets denied by the route-map application.
Handling of this case could be optimized by keeping state with the prefix and
the nexthops in BGP. The patch doesn't optimizing that case, as even with the
redundant withdraws the total number of updates to zebra are still be capped
by the total number of routes in the table.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Most of the attribute parsing functions were already sending a notify,
let's clean up the code to make it happen only once.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Quagga sources have inherited a slew of Page Feed (^L, \xC) characters
from ancient history. Among other things, these break patchwork's
XML-RPC API because \xC is not a valid character in XML documents.
Nuke them from high orbit.
Patches can be adapted simply by:
sed -e 's%^L%%' -i filename.patch
(you can type page feeds in some environments with Ctrl-V Ctrl-L)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
ISSUE:
Currently, for non-ipv4-unicast address families where prefixes are
encoded in MP_REACH/MP_UNREACH attributes, BGP ends up sending one
prefix per UPDATE message. This is quite inefficient. The patch
addresses the issue.
PATCH:
We introduce a scratch buffer in the peer structure that stores the
MP_REACH/MP_UNREACH attributes for non-ipv4-unicast families. This
enables us to encode multiple prefixes. In the end, the two buffers
are merged to create the UPDATE packet.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
[DL: removed no longer existing bgp_packet_withdraw prototype]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
ISSUE:
Suppose route1 and route2 received from route-reflector-client1 and client2
respectively have identical attributes. The current logic of creating the
adj-rib-out for a peer threads the 'adv' structures for both routes against
the same attribute. This results in 'bgp_update_packet()' to pack those
routes in the same UPDATE message with one attr structure formatted. The
originator-id is thus set according to the first route's received router id.
This is incorrect.
PATCH:
Fix bgp_announce_check() function to set the originator-id in the
advertising attr structure. Also, fix the attribute hash function and
compare function to consider originator-id. Otherwise attributes where all
fields except the originator-id are identical get merged into one memory
location.
Signed-off-by: Pradosh Mohapatra <pmohapat at cumulusnetworks.com>
Reviewed-by: Scott Feldman <sfeldma at cumulusnetworks.com>
Reviewed-by: Ken Yin <kyin at cumulusnetworks.com>
[DL: whitespace changes dropped]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
* bgp_attr.c: this UNSET_FLAG()s are bogus. I did a quick review and
I think that they could not cause any bug anyway.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Acked-by: Feng Lu <lu.feng@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
bgp_attr_munge_as4_attrs would previously try to reintegrate an AS4_PATH
with a NULL AS_PATH, leading to a rather nasty SEGV. Let's go by
RFC6793 and treat missing AS_PATH as 0-length AS_PATH, which in turn
means discarding the AS4_PATH.
[NB: we don't actually stick to the actual rule, which is discarding
AS4_PATH if it's longer than AS_PATH; indeed we should probably fix that
too]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Commit 558d1fec11 improved bgp_attr_dup so it would be possible
for the caller to provide attr_extra, allowing to use the stack instead
of the heap for operations requiring only a short lived attr.
However, this commit introduced a bug where bgp_attr_dup wouldn't copy
attr_extra at all (but provide a reference to the original) if the
caller provided attr_extra.
Cc: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
When going through the code to write the documentation for local-as,
I discovered that one of the comments was out-of-date.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Added replace-as modifier for BGP neighbors when using
local-as. If the replace-as modifier is specified, only the
replacement AS as specified by the local-as modifier is
prepended to the AS_PATH, not the process's AS.
In bgp_attr.c, I decided that
if (peer->change_local_as) {
/* If replace-as is specified, we only use the change_local_as when
advertising routes. */
if( ! CHECK_FLAG (peer->flags, PEER_FLAG_LOCAL_AS_REPLACE_AS) ) {
aspath = aspath_add_seq (aspath, peer->local_as);
}
aspath = aspath_add_seq (aspath, peer->change_local_as);
} else {
aspath = aspath_add_seq (aspath, peer->local_as);
}
was clearer than the alternative that didn't duplicate the prepending of the
process's AS:
/* First, append the process local AS unless we have an alternate local_as
* and we're replacing it (as opposed to just prepending it). */
if (! (peer->change_local_as
&& CHECK_FLAG (peer->flags, PEER_FLAG_LOCAL_AS_REPLACE_AS) ) ) {
aspath = aspath_add_seq (aspath, peer->local_as);
}
if (peer->change_local_as)
aspath = aspath_add_seq (aspath, peer->change_local_as);
}
But I could be convinced otherwise.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Use the array_size() helper macro. Replaces several instances of local
macros with the same definition.
Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Try to use on stack structs for temporary uses.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Reduce memory heap fragmentation and pressure on the memory allocator.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Reduce memory heap fragmentation and pressure on the memory allocator.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
peer_sort() it's called so much as to be annoying. In the assumption
that the 'sort' of the peer doesn't change during an established session,
I have changed all calls to peer_sort() in the 'fast-path' to only check
the 'sort'. All the calls from the vty and such still recalculate the sort
and store it in the peer.
There's a lot of other calls to peer_sort() that could be changed but some
maube tricky, someone more knowledgeable may try to reduce them.
This hits peer_sort() from 5th out of the stadium^H^H list on a full
internet table loading profiling session.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
* bgpd/bgp_attr.c: (bgp_attr_flags_diagnose) debug code for error-handling
paths probably shouldn't assert, instead it should just log that there
was no problem.
* bgpd/bgp_attr.c: (bgp_attr_parse) the invalid flag check call to
bgp_attr_malformed is pretty useless if it doesn't actually allow
for the PROCEED non-error case.
* bgp_attr.c: (bgp_attr_malformed) When a malformed attribute error can be
ignored, and BGP message processing may still proceed, the stream getp
should be adjusted to the end of the attribute - the caller may not have
consumed all the attribute. Problem noted by Martin Winter in bug 678.
Also, rename the 'startp' local to 'notify_datap', for clarity.
* bgp_attr.h: (struct bgp_attr_parser_args) Attribute parsing context,
containing common arguments.
* bgp_attr.c: (general) Move the bgp_attr_flag_invalid flag-check calls up,
out of each individual attr parser function, to be done once in attr_parse.
Similarly move the calculation of the 'total' attribute length field up
to attr_parse.
Bundle together common arguments to attr-parsing functions and helpers
into (struct bgp_attr_parser_args), so it can be passed by reference down
the stack & also de-clutter the argument lists & make it easier to
add/modify the context for attr-parsing - add local const aliases to avoid
modifying body of code too much. This also should help avoid cut & paste
errors, where calls to helpers with hard-coded attribute types are pasted
to other functions but the code isn't changed.
(bgp_attr_flags_diagnose) as above.
(bgp_attr_flag_invalid) as above.
(bgp_attr_{origin,aspath,as4_path,nexthop,med,local_pref,atomic}) as above.
(bgp_attr_{aggregator,as4_aggregator,community,originator_id}) as above
(bgp_attr_{cluster_list,ext_communities},bgp_mp_{un,}reach_parse) as above
(bgp_attr_unknown) as above.
(bgp_attr_malformed) as above. Also, startp and length have to be
special-cased, because whether or not to send attribute data depends
on the particular error - a separate length argument, distinct from
args->length, indicates whether or not the attribute data should be sent
in the NOTIFY.
(bgp_attr_aspath_check) Call to bgp_attr_malformed is wrong here, there is
no attribute parsing context - e.g. the 'flag' argument is unlikely to be
right, remove it. Explicitly handle the error instead.
(bgp_attr_munge_as4_attrs) Flag argument is pointless.
As the comment notes, the check here is pointless as AS_PATH presence
already checked elsewhere.
(bgp_attr_parse) Do bgp_attr_flag_invalid call here.
Use (struct bgp_attr_parser_args) for args to attr parser functions.
Remove out-of-context 'flag' argument to as4 checking functions.
* bgpd/bgp_attr.c: (attr_flags_values []) array of required flags for
attributes, EXTLEN & PARTIAL masked off as "dont care" as appropriate.
(bgp_attr_flag_invalid) check if flags may be invalid, according to
the above table & RFC rules.
(bgp_attr_*) Use bgp_attr_flag_invalid.
(bgp_attr_as4_aggregator) ditto, also take startp argument for the
NOTIFY data.
(bgp_attr_parse) pass startp to bgp_attr_as4_aggregator
* bgpd/bgp_attr.c: (bgp_attr_aspath) error message could be misleading,
clearly log what flag was incorrect.
(Problem noted in "bgpd: fix error message in bgp_attr_aspath()" in
Quagga-RE)
Commit 05a4936b713b9882171d0f7fb20b8439df23939e fixed some of the
attributes involved, but not all. This commit should do it.
* bgp_attr.c
* bgp_attr_originator_id()
* bgp_attr_cluster_list()
* bgp_mp_reach_parse()
* bgp_mp_unreach_parse()
Some of the recent attribute flags/length checks copied from QRE use
bgp_notify_send_with_data() directly, but master branch assumes
using bgp_attr_malformed().
* bgp_attr.c
* bgp_attr_med()
* bgp_attr_local_pref()
* bgp_attr_atomic()
* bgp_attr_originator_id()
* bgp_attr_cluster_list()
* bgp_mp_reach_parse()
* bgp_mp_unreach_parse()
* bgp_attr.[ch]
* bgp_mp_reach_parse(): add extra arguments and a uniform flag
check block
* bgp_mp_unreach_parse(): idem
* bgp_attr_parse(): provide extra arguments
* bgp_mp_attr_test.c
* parse_test(): justify respective calls
* bgp_attr.c
* bgp_attr_cluster_list(): accept extra argument, add checks for
"optional", "transitive" and "partial" bits, log each error
condition independently
* bgp_attr_parse(): provide extra arguments
* bgp_attr.c
* bgp_attr_originator_id(): accept extra argument, add checks for
"optional", "transitive" and "partial" bits, log each error
condition independently
* bgp_attr_parse(): provide extra arguments
Commit 2febf323411c1aed9d7694898f852ce2ef36a7e5 assumed every flag
bit except optional/transitive/partial unset, which at times could
not be true for "extended length" bit.
* bgp_attr.c
* bgp_attr_origin(): exclude BGP_ATTR_FLAG_EXTLEN from comparison
* bgp_attr_nexthop(): idem
* bgp_attr_med(): idem
* bgp_attr_local_pref(): idem
* bgp_attr_atomic(): idem
* bgp_attr.c
* bgp_attr_parse(): provide extra argument to bgp_attr_aggregator()
* bgp_attr_local_pref(): use bgp_notify_send_with_data()
* bgp_attr_atomic(): idem
* bgp_attr_aggregator(): idem
Conflicts:
bgpd/bgp_attr.c
Do not check each of the Optional/Transitive/Partial attribute
flag bits, when their only valid combination is known in advance,
but still perform bit-deep error message logging. This change
assumes unused (low-order) 4 bits of the flag octet cleared.
* bgp_attr.c
* bgp_attr_origin(): rewrite check
* bgp_attr_nexthop(): idem
* bgp_attr_med(): idem
* bgp_attr_local_pref(): idem
* bgp_attr_atomic(): idem
Conflicts:
bgpd/bgp_attr.c
ORIGIN handling function used to have "partial" bit check and recent
commits added it for NEXT_HOP, MULTI_EXIT_DISC and ATOMIC_AGGREGATE
cases. This commit adds "partial" check for AS_PATH and LOCAL_PREF
cases, which should leave attributes 1 through 6 inclusive completely
covered with attribute flags checks.
* bgp_attr.c
* bgp_attr_origin(): use bit-by-bit checks for better diagnostics
* bgp_attr_aspath(): add flag check
* bgp_attr_local_pref(): idem
Conflicts:
bgpd/bgp_attr.c
* lib/prefix.h
* IPV4_CLASS_DE(): new helper macro
* bgp_attr.c
* bgp_attr_nexthop(): add check for "partial" bit, refresh flag error
reporting, explain meaning of RFC4271 section 6.3 and implement it
Conflicts:
bgpd/bgp_attr.c
(with resolved conflict in bgpd/bgp_packet.c)
Two macros resolving to the same integer constant broke a case block and
a more thorough merge of BGP_SAFI_VPNV4 and BGP_SAFI_VPNV6 was
performed.
* bgpd.h: MPLS-labeled VPN SAFI is AFI-independent, switch to single
* macro
* bgp_capability_test.c: update test data
* bgp_mp_attr_test.c: idem
* bgp_route.c: (bgp_maximum_prefix_overflow, bgp_table_stats_vty) update
macro and check conditions (where appropriate)
* bgp_packet.c: (bgp_route_refresh_send, bgp_capability_send,
bgp_update_receive, bgp_route_refresh_receive): idem
* bgp_open.c: (bgp_capability_vty_out, bgp_afi_safi_valid_indices,
bgp_open_capability_orf, bgp_open_capability): idem
* bgp_attr.c: (bgp_mp_reach_parse, bgp_packet_attribute,
bgp_packet_withdraw): idem
* bgp_attr.c
* bgp_attr_atomic(): accept extra argument, add checks for
"optional", "transitive" and "partial" bits, log each error
condition independently
* bgp_attr_parse(): provide extra argument
* bgp_attr.c
* bgp_attr_local_pref(): accept extra argument, add checks for
"optional" and "transitive" bits, log each error condition
independently
* bgp_attr_parse(): provide extra argument
Contains BGP fixes:
- set extcommunity crash: tihs patch tries to make the refcounting more robust
but does not fully solve the problem, sadly.
- BGP attribute error handling: Little testing.
* bgp_attr.c: (attrhash_key_make) 98e30f should have changed jhash2 to jhash.
These kinds of merge errors would be reduced and life would be easier if
people would submit fully-formed fixes that could be chucked directly into
git-am.