Commit Graph

5401 Commits

Author SHA1 Message Date
David Ahern
57ac202c78 Update kernel headers
Update kernel headers to commit
ae40832e53c3 ("bpfilter: fix a build err")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-30 08:06:19 -07:00
Stephen Hemminger
65083b5fe3 ip: defer lookup interface index
The ip command would always lookup the network device index
even when not necessary. This slows down operations like creating
lots of VLAN's.

David reported the original issue, this is an alternative patch
that solves it in a slightly more general method.

Using iproute2 to create a bridge and add 4094 vlans to it can take from
2 to 3 *minutes*. The reason is the extraneous call to ll_name_to_index.
ll_name_to_index results in an ioctl(SIOCGIFINDEX) call which in turn
invokes dev_load. If the index does not exist, which it won't when
creating a new link, dev_load calls modprobe twice -- once for
netdev-NAME and again for NAME. This is unnecessary overhead for each
link create.

When ip link is invoked for a new device, there is no reason to
call ll_name_to_index for the new device. With this patch, creating
a bridge and adding 4094 vlans takes less than 3 *seconds*.

	old:
	# time ip -batch ip-vlan.batch
	real    3m13.727s
	user    0m0.076s
	sys     0m1.959s

	new:
	# time ip -batch ip-vlan.batch
	real    0m3.222s
	user    0m0.044s
	sys     0m1.777s

Reported-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-25 07:48:40 -07:00
David Ahern
39d16a02d9 ip route: Print expires as signed int
rta_expires is a signed int; print it as one.

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-23 15:29:56 -07:00
Pavel Maltsev
e2f5ceccda Allow to configure /var/run/netns directory
Currently NETNS_RUN_DIR is hardcoded and refers to /var/run/netns.
However, some systems (e.g. Android) doesn't have /var
which results in error attempts to create network namespaces on these
systems.  This change makes NETNS_RUN_DIR configurable at build time
by allowing to pass environment variable to make command.
Also, this change makes /etc/netns directory configurable through
NETNS_ETC_DIR environment variable.

For example: ./configure && NETNS_RUN_DIR=/mnt/vendor/netns make

Tested: verified that iproute2 with configuration mentioned above
creates namespaces in /mnt/vendor/netns

Signed-off-by: Pavel Maltsev <pavelm@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-23 15:16:53 -07:00
Jiri Pirko
852ed60528 devlink: introduce support for showing port flavours
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-23 12:58:55 -07:00
David Ahern
c2a569e63c Update kernel headers
Update kernel headers to commit
e89e59c08d1b ("Merge branch 'net-sfp-small-improvements'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-23 12:58:34 -07:00
David Ahern
24bd5ac6b8 Merge branch 'rdma-resource-tracking' into iproute2-next
Steve Wise  says:

====================

This series enhances the iproute2 rdma tool to include displaying
driver-specific resource attributes.  It is the user-space part of the
kernel driver resource tracking series that has been accepted for merging
into linux-4.18 [1]

If there are no additional review comments, it can now be merged, I think.

Changes since v2:
- resync rdma_netlink.h to fix uapi break

Changes since v1:
- commit log editorial fixes
- cite kernel commits that updated rdma_netlink.h in the
  iproute2 commit syncing this header
- reorder stack definitions ala "reverse christmas tree"
- correctly handle unknown driver attributes when printing

Changes since v0/rfc:
- changed "provider" to "driver" based on kernel side changes
- updated man pages
- removed "RFC" tag

Thanks,

Steve.

[1] https://www.spinics.net/lists/linux-rdma/msg64199.html

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:21:42 -07:00
Steve Wise
853d222d78 rdma: update man pages
Update the man pages for the resource attributes as well
as the driver-specific attributes.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:20:01 -07:00
Steve Wise
331152752a rdma: print driver resource attributes
This enhancement allows printing rdma device-specific state, if provided
by the kernel. This is done in a generic manner, so rdma tool doesn't
need to know about the details of every type of rdma device.

Driver attributes for a rdma resource are in the form of <key,
[print_type], value> tuples, where the key is a string and the value can
be any supported driver attribute. The print_type attribute, if present,
provides a print format to use vs the standard print format for the type.
For example, the default print type for a PROVIDER_S32 value is "%d ",
but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe tuple.

Driver resources are only printed when the -dd flag is present.
If -p is present, then the output is formatted to not exceed 80 columns,
otherwise it is printed as a single row to be grep/awk friendly.

Example output:

# rdma resource show qp lqpn 1028 -dd -p
link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 0 comm [nvme_rdma]
    sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106 flush_cidx 85 in_use 0
    size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41 wq_pidx 171 msn 44 rqt_hwaddr 0x2a8a5d00
    rqt_size 256 in_use 128 size 130

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:17:16 -07:00
Steve Wise
366d20b91f rdma: update rdma_netlink.h to get new driver attributes
Pull in the rdma_netlink.h changes from kernel
commits:

25a0ad85156a ("RDMA/nldev: Add explicit pad attribute")
da5c85078215 ("RDMA/nldev: add driver-specific resource tracking)"
0d52d803767e ("RDMA/uapi: Fix uapi breakage")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:16:22 -07:00
Jon Maloy
5947046dd9 tipc: fixed node and name table listings
We make it easier for users to correlate between 128-bit node
identities and 32-bit node hash number by extending the 'node list'
command to also show the hash number.

We also improve the 'nametable show' command to show the node identity
instead of the node hash number. Since the former potentially is much
longer than the latter, we make room for it by eliminating the (to the
user) irrelevant publication key. We also reorder some of the columns so
that the node id comes last, since this looks nicer and is more logical.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:12:24 -07:00
Roman Mashak
53d34eb66c tc: add missing space symbol in ife output
In order to make TDC tests match the output patterns, the missing space
character must be added in the mode output string.

Fixes: 8744c5d338 ("tc: jsonify ife action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:10:48 -07:00
Marcelo Ricardo Leitner
ac6a4c2299 tc: flower: add support for verbose logging
Currently there is no way to log offloading errors if the rule is not
explicitly marked as skip_sw, making it hard for other applications such
as Open vSwitch to log why a given could not be offloaded.

This patch adds support for signaling the kernel that more verbose
logging is wanted, which now will include such messages.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:06:04 -07:00
David Ahern
4276e65290 Update kernel headers
Update kernel headers to commit
64a2658b58ab ("net: mscc: Add SPDX identifier")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:05:07 -07:00
Stephen Hemminger
405e0c4ffe tc: allow 0% for percent options
Allowing 0% is sometimes useful for example in netem loss and drop
or perhaps dropping all traffic in a HTB bin.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199745
Reported-by: stuartmarsden@gmail.com
Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-17 16:20:50 -07:00
Marcelo Ricardo Leitner
4f59f4a5af tc-netem: fix limit description in man page
As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:

static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                         struct sk_buff **to_free)
{
	...
        if (unlikely(sch->q.qlen >= sch->limit))
                return qdisc_drop_all(skb, sch, to_free);

So lets fix the description of the field in the man page.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-16 15:36:48 -07:00
Marcelo Ricardo Leitner
3f2c23811d tc-netem: fix limit description in man page
As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:

static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                         struct sk_buff **to_free)
{
	...
        if (unlikely(sch->q.qlen >= sch->limit))
                return qdisc_drop_all(skb, sch, to_free);

So lets fix the description of the field in the man page.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-16 14:16:29 -07:00
David Ahern
961d0991bc Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-16 14:10:27 -07:00
Luca Boccassi
9b13cc98f5 ip: do not drop capabilities if net_admin=i is set
Users have reported a regression due to ip now dropping capabilities
unconditionally.
zerotier-one VPN and VirtualBox use ambient capabilities in their
binary and then fork out to ip to set routes and links, and this
does not work anymore.

As a workaround, do not drop caps if CAP_NET_ADMIN (the most common
capability used by ip) is set with the INHERITABLE flag.
Users that want ip vrf exec to work do not need to set INHERITABLE,
which will then only set when the calling program had privileges to
give itself the ambient capability.

Fixes: ba2fc55b99 ("Drop capabilities if not running ip exec vrf with libcap")

Signed-off-by: Luca Boccassi <bluca@debian.org>
2018-05-14 21:07:34 -07:00
David Ahern
4aba7dc4f4 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	rdma/include/uapi/rdma/rdma_netlink.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 21:04:16 -07:00
GhantaKrishnamurthy MohanKrishna
7d40bdbc8d tipc: Add support to set and get MTU for UDP bearer
In this commit we introduce the ability to set and get
MTU for UDP media and bearer.

For set and get properties such as tolerance, window and priority,
we already do:

    $ tipc media set PPROPERTY media MEDIA
    $ tipc media get PPROPERTY media MEDIA

    $ tipc bearer set OPTION media MEDIA ARGS
    $ tipc bearer get [OPTION] media MEDIA ARGS

The same has been extended for MTU, with an exception to support
only media type UDP.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 20:53:32 -07:00
David Ahern
fd95ec0e8e Update kernel headers
Update kernel headers to commit 53a7bdfb2a27
("dt-bindings: dsa: Remove unnecessary #address/#size-cells")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 20:52:52 -07:00
Stephen Hemminger
10f687736b ss: remove non-functional slabinfo
Ss was using slabinfo to try and intuit TCP statistics.
The slabinfo changed several times since 2.4 and all these statistics
are broken by renames and slab merging. Plus slabinfo does not exist
at all if kernel is compiled with SLUB option.

Rather than trying to fix kernel, just trim away the no longer
valid statistics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 13:57:08 -07:00
Stephen Hemminger
9b2ab68516 rdma: add ib header files
The iproute2 header files must be complete to allow builds on
other places where some of the headers are not present.

For example, iproute2 is built on Windows Services for Linux
as a test tool. With the partial addition of rdma it was broken.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 08:14:55 -07:00
Stephen Hemminger
36eece51e3 rdma: align headers with upstream
This makes rdma/include/uapi/rdma headers align with those produced
by doing make headers_install from upstream (Linus) tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 08:12:13 -07:00
Jakub Kicinski
0c0394ff83 bpf: don't offload perf array maps
Perf arrays are handled specially by the kernel, don't request
offload even when used by an offloaded program.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-05 11:08:00 -07:00
David Ahern
7732148d1d Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-05 11:07:47 -07:00
Ido Schimmel
c0ec7c9f87 iproute: Parse last nexthop in a multipath route
Continue parsing a multipath payload as long as another nexthop can fit
in the payload.

# ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1

Before:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1

After:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1
        nexthop dev dummy1 weight 1

Fixes: f48e14880a ("iproute: refactor multipath print")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:44 -07:00
Baruch Siach
37bf5c6fcb arpd: remove pthread dependency
Explicit link with pthread is not needed when linking dynamically. Even
static link with recent libdb does not pull in the code that uses
pthread. Finally, the configure check introduced in commit a25df4887d
(configure: Check for Berkeley DB for arpd compilation) does not add
-lpthread to its link command.

This change allows arpd build with toolchains that do not provide
threads support.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:03 -07:00
Baruch Siach
3b07981a27 README: update libdb build dependency information
Debian does not distribute libdb4.x-dev for quite some time now. Current
stable carries libdb5.3-dev. Update the wording accordingly.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:03 -07:00
Toke Høiland-Jørgensen
4db2ff0db4 json_print: Fix hidden 64-bit type promotion
print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").

Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.

Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites. For symmetry,
also add a print_luint() method (with no users).

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 11:08:55 -07:00
Toke Høiland-Jørgensen
bf717756b5 ingress: Don't break JSON output
The dash printed by the ingress qdisc breaks JSON output, so only print it
in regular output mode.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 11:08:39 -07:00
Hangbin Liu
8f01001abc vxlan: add ttl auto in help message
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:43:46 -07:00
Sabrina Dubroca
7f520601f5 gre/gre6: allow clearing {,i,o}{key,seq,csum} flags
Currently, iproute allows setting those flags, but it's impossible to
clear them, since their current value is fetched from the kernel and
then we OR in the additional flags passed on the command line.

Add no* variants to allow clearing them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:58 -07:00
Sabrina Dubroca
d21c028cf7 man: ip link: document GRE tunnels
GRE tunnels are currently only documented together with IPIP and SIT
tunnels, but they actually have very different configuration
options. Let's separate them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:44 -07:00
David Ahern
0d93d1e736 Merge branch 'master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:21 -07:00
Jakub Kicinski
f5393225f9 iplink_geneve: correct size of message to avoid spurious errors
Commit 6c4b672738 ("iplink_geneve: Get rid of inet_get_addr()")
inadvertently changed the parameter to addattr_l() resulting in:

addattr_l ERROR: message exceeded bound of 4

when remote is specified.

Fixes: 6c4b672738 ("iplink_geneve: Get rid of inet_get_addr()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
2018-04-20 10:39:53 -07:00
Stephen Hemminger
260a92afe6 bpf: fix warnings on gcc-8 about string truncation
In theory, the path for BPF could exceed the 4K PATH_MAX.
In practice, not really possible. But shut up gcc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:38:00 -07:00
Roman Mashak
0aaf62fcb6 tc: return on invalid smac or dmac in ife action
Return on invalid smac/dmac and use invarg consistently for invalid
arguments report.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-04-20 10:35:21 -07:00
Stephen Hemminger
0b01f088ee flower: use 16 bit format where possible
Should use print_hu not print_uint for 16 bit value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:35:00 -07:00
Stephen Hemminger
7cd3f08b6f ipneigh: fix missing format specifier
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:33:22 -07:00
Hangbin Liu
5b1c363c7b vxlan: fix ttl inherit behavior
Like kernel net-next commit 72f6d71e491e6 ("vxlan: add ttl inherit support"),
vxlan ttl inherit should means inherit the inner protocol's ttl value.

But currently when we add vxlan with "ttl inherit", we only set ttl 0,
which is actually use whatever default value instead of inherit the inner
protocol's ttl value.

To make a difference with ttl inherit and ttl == 0, we add an attribute
IFLA_VXLAN_TTL_INHERIT when "ttl inherit" specified. And use "ttl auto"
to means "use whatever default value", the same behavior with ttl == 0.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-19 11:11:27 -07:00
David Ahern
075bf62a70 Update kernel headers
Update kernel headers to commit 292eba02dbb4
("net-next/hinic: add arm64 support")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-19 11:10:27 -07:00
David Ahern
d42c7891d2 utils: Do not reset family for default, any, all addresses
Thomas reported a change in behavior with respect to autodectecting
address families. Specifically, 'ip ro add default via fe80::1'
syntax was failing to treat fe80::1 as an IPv6 address as it did in
prior releases. The root causes appears to be a change in family when
the default keyword is parsed.

'default', 'any' and 'all' are relevant outside of AF_INET. Leave the
family arg as is for these when setting addr.

Fixes: 93fa12418d ("utils: Always specify family and ->bytelen in get_prefix_1()")
Reported-by: Thomas Deutschmann <whissi@gentoo.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Serhey Popovych <serhe.popovych@gmail.com>
2018-04-16 17:00:48 -07:00
Jakub Sitnicki
ee53b42fd8 iproute: Abort if nexthop cannot be parsed
Attempt to add a multipath route where a nexthop definition refers to a
non-existent device causes 'ip' to crash and burn due to stack buffer
overflow:

  # ip -6 route add fd00::1/64 nexthop dev fake1
  Cannot find device "fake1"
  Cannot find device "fake1"
  Cannot find device "fake1"
  ...
  Segmentation fault (core dumped)

Don't ignore errors from the helper routine that parses the nexthop
definition, and abort immediately if parsing fails.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
2018-04-16 16:58:38 -07:00
Roman Mashak
8744c5d338 tc: jsonify ife action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-15 17:23:17 -07:00
Roman Mashak
7b17701717 tc: jsonify skbedit action
v2:
   FIxed strings format in print_string()

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-15 17:09:16 -07:00
Stephen Hemminger
811ee8943c uapi/sctp: update header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:50:00 -07:00
Stephen Hemminger
b7d3a4f009 uapi/tipc: update header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:49:41 -07:00
Stephen Hemminger
dcf7997bcd uapi/bpf: update kernel header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:48:56 -07:00