The code is a bit messy, as it starts with space after text and at some
point switches to space before text. But either way, printing space
before *and* after text almost certainly leads to printing more
whitespace than necessary.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Prior to this patch, If one route entry's RTA_PREFSRC and RTA_GATEWAY
both were NULL, it was supposed to be restored ONLY as a local address.
But as it didn't check tb[RTA_PREFSRC] when restoring local networks,
rtattr_cmp would return a success if it was NULL, this route entry would
be restored again as a local network.
This patch is to add tb[RTA_PREFSRC] check when restoring local networks.
Fixes: 74af8dd962 ("ip route: restore route entries in correct order")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Phil Sutter <phil@nwl.cc>
This big patch was compiled by vimgrepping for memset calls and changing
to C99 initializer if applicable. One notable exception is the
initialization of union bpf_attr in tc/tc_bpf.c: changing it would break
for older gcc versions (at least <=3.4.6).
Calls to memset for struct rtattr pointer fields for parse_rtattr*()
were just dropped since they are not needed.
The changes here allowed the compiler to discover some unused variables,
so get rid of them, too.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Sometimes we cannot restore route entries, because in kernel
[1] fib_check_nh()
[2] fib_valid_prefsrc()
cause some routes to depend on existence of others while adding.
For example, we saved all the routes, and flushed all tables
[a] default via 192.168.122.1 dev eth0
[b] 192.168.122.0/24 dev eth0 src 192.168.122.21
[c] broadcast 127.0.0.0 dev lo table local src 127.0.0.1
[d] local 127.0.0.0/8 dev lo table local src 127.0.0.1
[e] local 127.0.0.1 dev lo table local src 127.0.0.1
[f] broadcast 127.255.255.255 dev lo table local src 127.0.0.1
[g] broadcast 192.168.122.0 dev eth0 table local src 192.168.122.21
[h] local 192.168.122.21 dev eth0 table local src 192.168.122.21
[i] broadcast 192.168.122.255 dev eth0 table local src 192.168.122.21
Now start to restore them:
If we want to add [a], we have to add [b] first, as [1] and
'via 192.168.122.1' in [a].
If we want to add [b], we have to add [h] first, as [2] and
'src 192.168.122.21' in [b].
So the correct order to restore should be like:
[e][h] -> [b][c][d][f][g][i] -> [a]
This patch fixes it by traversing the file 3 times, it only restores
part of them in each run according to the following conditions, to
make sure every entry can be restored successfully.
1. !gw && (!fib_prefsrc || fib_prefsrc == cfg->fc_dst)
2. !gw && (fib_prefsrc != cfg->fc_dst)
3. gw
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Add vrf keyword to 'ip route' commands. Allows:
1. Users can list routes by VRF name:
$ ip route show vrf NAME
VRF tables have all routes including local and broadcast routes.
The VRF keyword filters LOCAL and BROADCAST routes; to see all
routes the table option can be used. Or to see local routes only
for a VRF:
$ ip route show vrf NAME type local
2. Add or delete a route for a VRF:
$ ip route {add|delete} vrf NAME <route spec>
3. Do a route lookup for a VRF:
$ ip route get vrf NAME ADDRESS
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Currently a timeout is multiplied by HZ in user-space and
then it multiplied by HZ in kernel-space.
$ ./ip/ip r add 2002::0/64 dev veth1 expires 10
$ ./ip/ip -6 r
2002::/64 dev veth1 metric 1024 linkdown expires 996sec pref medium
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Stephen Hemminger <shemming@brocade.com>
Fixes: 68eede2505 ("route: allow routes to be configured with expire values")
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
There is only a single user who needs it to be reentrant (not really,
but it's safer like this), add rt_addr_n2a_r() for it to use.
Signed-off-by: Phil Sutter <phil@nwl.cc>
There are only three users which require it to be reentrant, the rest is
fine without. Instead, provide a reentrant format_host_r() for users
which need it.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This is a bit pedantic, but brackets ([]) show optional values and since
TYPE must not become empty, they're not suited to surround the type
keyword choices. Use curly braces instead.
Also add some missing whitespace to the parameter list above.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This patch adds support to add mpls multipath
routes.
example:
ip -f mpls route add 100 \
nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
the warning was:
iproute.c:301:12: warning: 'val' may be used uninitialized in this
function [-Wmaybe-uninitialized]
features &= ~RTAX_FEATURE_ECN;
^
iproute.c:575:10: note: 'val' was declared here
__u32 val;
^
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Commit 0f7543322c ("route: ignore RTAX_HOPLIMIT of value -1")
accidentally reordered fprintf statements. This patch restores the
original ordering.
Fixes: 0f7543322c ("route: ignore RTAX_HOPLIMIT of value -1")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Older kernels use -1 internally as indicator to use the sysctl default,
but they still export the setting. Newer kernels use 0 to indicate that
(which is why the conversion from -1 to 0 was done here), but they also
stopped exporting the value. Since the meaning of -1 is clear, treat it
equally like default on newer kernels (which is to not print anything).
Signed-off-by: Phil Sutter <phil@nwl.cc>
Technically, the range of possible hoplimit values are defined by IPv4
and IPv6 header formats. Both define the field to be eight bits in size,
which leads to a value range of [0;255]. Setting a packet's hoplimit
field to 0 though makes not much sense, as the next hop would
immediately drop the packet. Therefore Linux uses 0 as a special value
indicating to use the system's default hoplimit (configurable via
sysctl). In iproute, setting the hoplimit of a route to 0 is equivalent
to omitting the hoplimit parameter alltogether, so it is actually not
necessary to allow that value to be specified, but keep it anyway for
backwards compatibility.
Signed-off-by: Phil Sutter <phil@nwl.cc>
If get_rt_realms() fails, try to get a possible raw u32 realms
value for the u32 RTA_FLOW/FRA_FLOW attribute, as it might be
useful to directly configure the hex value itself. And only if
that fails, then bail out.
The source realm is provided in the upper u16 (mask: 0xffff0000)
and the destination realm through the lower u16 part (mask:
0x0000ffff). This can be useful for tc's bpf realm matcher, but
also a full hex/mask param can be provided already for matching
through iptables' --realm cmdline option, for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds support to parse and print lwtunnel
encapsulation attributes attached to routes for MPLS
and IP tunnels.
example:
Add ipv4 route with mpls encap attributes:
Examples:
MPLS:
$ ip route add 40.1.2.0/30 encap mpls 200 via inet 40.1.1.1 dev eth3
$ ip route show
40.1.2.0/30 encap mpls 200 via 40.1.1.1 dev eth3
Add ipv4 multipath route with mpls encap attributes:
$ ip route add 10.1.1.0/30 nexthop encap mpls 200 via 10.1.1.1 dev eth0 \
nexthop encap mpls 700 via 40.1.1.2 dev eth3
$ ip route show
10.1.1.0/30
nexthop encap mpls 200 via 10.1.1.1 dev eth0 weight 1
nexthop encap mpls 700 via 40.1.1.2 dev eth3 weight 1
IP:
$ ip route add 10.1.1.1/24 encap ip id 200 dst 20.1.1.1 dev vxlan0
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Currently 'ip route get' does not show the table the lookup result comes
from and prior to kernel commit c36ba6603a11 the response from the kernel
was hardcoded to the main table. From the discussion this appears to be
a leftover from the route cache where the cached entry lost the table id
and so the result was hardcoded to main table.
c36ba6603a11 added the RTM_F_LOOKUP_TABLE flag to maintain that behavior
but to allow new tools to ask for the actual table id for the lookup.
This patch adds that flag to ip route get request and if the result is
not the main table shows the table id.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Currently when we specify AF_INET6 when it is disabled, we will get
all routes.
For example, we can boot kernel with ipv6.disable=1 and try to get ipv6
routes:
$ ip -6 route show
default via 192.168.122.1 dev eth0 proto static metric 100
192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.141 metric 100
Here are ipv4 routes and this is unexpected behaviour.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
This patch replaces exits with returns in
ip route get command handling. This allows batching
of ip route get commands.
$cat route_get_batch.txt
route get 10.0.14.2
route get 12.0.14.2
route get 10.0.14.4
$ip -batch route_get_batch.txt
local 10.0.14.2 dev lo src 10.0.14.2
cache <local>
12.0.14.2 via 192.168.0.2 dev eth0 src 192.168.0.15
cache
10.0.14.4 dev dummy0 src 10.0.14.2
cache
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
This patch fixes incorrect -EINVAL errors due to invalid
scope and type during mpls route deletes.
$ip -f mpls route add 100 as 200 via inet 10.1.1.2 dev swp1
$ip -f mpls route show
100 as to 200 via inet 10.1.1.2 dev swp1
$ip -f mpls route del 100 as 200 via inet 10.1.1.2 dev swp1
RTNETLINK answers: Invalid argument
$ip -f mpls route del 100
RTNETLINK answers: Invalid argument
After patch:
$ip -f mpls route show
100 as to 200 via inet 10.1.1.2 dev swp1
$ip -f mpls route del 100 as 200 via inet 10.1.1.2 dev swp1
$ip -f mpls route show
Always set type to RTN_UNICAST for mpls route add/deletes.
Also to keep things consistent with kernel set scope to
RT_SCOPE_UNIVERSE for both mpls and ipv6 routes. Both mpls and ipv6 route
deletes ignore scope.
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
There have been several instances where response from kernel
has overrun the stack buffer from the caller. Avoid future problems
by passing a size argument.
Also drop the unused peer and group arguments to rtnl_talk.
If kernel complains about ip route request, exit status should be
2 not 1.
This fixes regression introduced by:
commit 42ecedd4ba
Author: Roopa Prabhu <roopa@cumulusnetworks.com>
Date: Tue Mar 17 19:26:32 2015 -0700
fix ip -force -batch to continue on errors
The kernel now has the capability to offload FDB and FIB entries to hardware.
It is important to let users know if table entries are also offloaded to
hardware. Currently offloaded FDB entries are indicated by the existence of
the flag 'external' on the entry as of the following commit:
commit 28467b7f3f
Author: Scott Feldman <sfeldma@gmail.com>
Date: Thu Dec 4 09:57:15 2014 +0100
bridge/fdb: add flag/indication for FDB entry synced from offload device
When the patch to add support for indicating that FIB entries were also
offloaded as posted to netdev by Scott Feldman it became clear that 'external'
would not be an ideal name for routes. There could definitely be confusion
about what this might mean since many routes are to external networks -- a
collision/confusion that did not happen with FDB.
Scott Feldman asked me to check with others and build concensus around a name.
After speaking with several people about this I am proposing we refer to both
FDB and FIB entries that are currently backed by hardware (based on the work
done in rocker) with the flag 'offload' appended to the end ofthe entry.
Some people liked the string 'external,' others liked 'hardware,' but the point
is to communicate that these routes are available to something that will will
offload the forwarding normally done by the kernel. Since the term 'offload'
is used so frequently it seems appropriate to use the same language in
ip/bridge output.
The term 'offload' also seems to resonate with many of the people who have
responded on Scott's original thread or to those who I reached out to directly
and did respond to my query, so it seems we have reached consensus that it
should be the term used going forward.
v2: rebased against net-next branch
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: John W. Linville <linville@tuxdriver.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Scott Feldman <sfeldma@gmail.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
This allows querying and setting the route preference. It's usually set from
the IPv6 Neighbor Discovery Router Advertisement messages.
Introduced in "ipv6: expose RFC4191 route preference via rtnetlink", enqueued
for Linux 4.1.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
- Pull in the uapi mpls.h
- Update rtnetlink.h to include the mpls rtnetlink notification multicast group.
- Define AF_MPLS in utils.h if it is not defined from elsewhere
as is done with AF_DECnet
The address syntax for multiple mpls labels is a complete invention.
When I looked there seemed to be no wide spread convention for talking
about an mpls label stack in text for. Sometimes people did:
"{ Label1, Label2, Label3 }", sometimes people would do:
"[ label3, label2, label1 ]", and most of the time label
stacks were not explicitly shown at all.
The syntax I wound up using, so it would not have spaces and so it
would visually distinct from other kinds of addresses is.
label1/label2/label3 Where label1 is the label at the top of the label
stack and label3 is the label at the bottom on the label stack.
When there is a single label this matches what seems to be convention
with other tools. Just print out the numeric value of the mpls label.
The netlink protocol for labels uses the on the wire format for a
label stack. The ttl and traffic class are expected to be 0. Using
the on the wire format is common and what happens with other address
types. BGP when passing label stacks also uses this technique with the
exception that the ttl byte is not included making each label in a BGP
label stack 3 bytes instead of 4.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This attribute is like RTA_DST except it specifies the destination
address to place on a packet when it leaves the host. For ip based
protocols this is destination NAT and not a common part of forwarding.
For protocols like MPLS label swapping is something that typically
happens on every hop.
There is likely to be a RTA_NEWSRC at some point so RTA_NEWDST
is printed as "as to" and can be specified either as "as to"
or just "as"
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add support for the RTA_VIA attribute that specifies an address family
as well as an address for the next hop gateway.
To make it easy to pass this reorder inet_prefix so that it's tail
is a proper RTA_VIA attribute.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
For some address families (like AF_PACKET) it is helpful to have the
length when prenting the address.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
On ip route print dump, label externally offloaded routes with "external".
Offloaded routes are flagged with RTNH_F_EXTERNAL, a recent additon to
net-next. For example:
$ ip route
default via 192.168.0.2 dev eth0
11.0.0.0/30 dev swp1 proto kernel scope link src 11.0.0.2 external
11.0.0.4/30 via 11.0.0.1 dev swp1 proto zebra metric 20 external
11.0.0.8/30 dev swp2 proto kernel scope link src 11.0.0.10 external
11.0.0.12/30 via 11.0.0.9 dev swp2 proto zebra metric 20 external
12.0.0.2 proto zebra metric 30 external
nexthop via 11.0.0.1 dev swp1 weight 1
nexthop via 11.0.0.9 dev swp2 weight 1
12.0.0.3 via 11.0.0.1 dev swp1 proto zebra metric 20 external
12.0.0.4 via 11.0.0.9 dev swp2 proto zebra metric 20 external
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.15
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
This patch replaces exits with returns in several
iproute2 commands. This fixes `ip -batch -force`
to not exit but continue on errors.
$cat c.txt
route del 1.2.3.0/24 dev eth0
route del 1.2.4.0/24 dev eth0
route del 1.2.5.0/24 dev eth0
route add 1.2.3.0/24 dev eth0
$ip -force -batch c.txt
RTNETLINK answers: No such process
Command failed c.txt:2
RTNETLINK answers: No such process
Command failed c.txt:3
Reported-by: Sven-Haegar Koch <haegar@sdinet.de>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
This patch adds configuration and dumping of congestion control metric
for ip route, for example:
ip route add <dst> dev foo congctl [lock] dctcp
Reference: http://thread.gmane.org/gmane.linux.network/344733
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
This permits to selectively enable explicit congestion notification via
the routing table.
If this ecn feature is not set, the kernel will use the tcp_ecn sysctl
to decide wheter to use ECN when establising a TCP connection.
At the time of this writing, the kernel supports ecn and allfrags, but
allfrags is of dubious value and not implemented here.
Example:
ip route change 192.168.2.0/24 dev eth0 features ecn
Signed-off-by: Florian Westphal <fw@strlen.de>