Commit Graph

3572 Commits

Author SHA1 Message Date
Stephen Hemminger
1d2cfcf8b5 update kernel headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:31:26 +02:00
Stephen Hemminger
7fde8cfddc include: add TCP fastopen option
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:30:48 +02:00
Stephen Hemminger
fa19d6bc01 bpf: update header file
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:28:36 +02:00
Hangbin Liu
86bf43c7c2 lib/libnetlink: update rtnl_talk to support malloc buff at run time
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.

With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.

With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.

We need to free answer after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
Hangbin Liu
2d34851cd3 lib/libnetlink: re malloc buff if size is not enough
With commit 72b365e8e0 ("libnetlink: Double the dump buffer size")
we doubled the buffer size to support more VFs. But the VFs number is
increasing all the time. Some customers even use more than 200 VFs now.

We could not double it everytime when the buffer is not enough. Let's just
not hard code the buffer size and malloc the correct number when running.

Introduce function rtnl_recvmsg() to always return a newly allocated buffer.
The caller need to free it after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
Stephen Hemminger
66e40a4a86 update headers for TC and TIPC from net-next 2017-10-25 12:40:47 +02:00
Stephen Hemminger
2ac0c6c2c1 Merge branch 'master' into net-next 2017-10-25 12:39:18 +02:00
Jamal Hadi Salim
35f2a7639d tc/actions: introduce support for jump action
Sample use case:

... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress

 ... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe

sudo $TC -s actions ls action police
 action order 0:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
 ref 1 bind 0 installed 41 sec used 41 sec
 Action statistics:
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12

... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12

Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8

... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw  (rule hit 20000 success 10000)
  match 7f000008/ffffffff at 16 (success 10000 )
	action order 1:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
	ref 2 bind 1 installed 198 sec used 2 sec
	Action statistics:
	Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
	backlog 0b 0p requeues 0

	action order 2:  skbedit mark 11 pass
	 index 11 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 3:  skbedit mark 12 pass
	 index 12 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-25 12:33:46 +02:00
Nikolay Aleksandrov
a5e3f41b4d ip: bridge_slave: add neigh_suppress to the type help and
Add neigh_suppress to the type help and document it in ip-link's man page.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-23 14:46:24 +02:00
Stephen Hemminger
702631416e Merge branch 'master' into net-next 2017-10-23 14:44:55 +02:00
Roman Mashak
c4be5febaa ss: initialize 'fackets' member of tcpstat structure
'fackets' has never been initialized with kernel extracted information, thus
never really printed.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-23 14:43:11 +02:00
Michal Kubecek
21503ed2af ip maddr: fix filtering by device
Commit 530903dd90 ("ip: fix igmp parsing when iface is long") uses
variable len to keep trailing colon from interface name comparison.  This
variable is local to loop body but we set it in one pass and use it in
following one(s) so that we are actually using (pseudo)random length for
comparison. This became apparent since commit b48a1161f5 ("ipmaddr: Avoid
accessing uninitialized data") always initializes len to zero so that the
name comparison is always true. As a result, "ip maddr show dev eth0" shows
IPv4 multicast addresses for all interfaces.

Instead of keeping the length, let's simply replace the trailing colon with
a null byte. The bonus is that we get correct interface name in ma.name.

Fixes: 530903dd90 ("ip: fix igmp parsing when iface is long")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Petr Vorel <pvorel@suse.cz>
2017-10-21 15:02:24 +02:00
Phil Sutter
572e893613 ss: Detect IPPROTO_ICMPV6 sockets
Prefix IPPROTO_ICMPV6 sockets with 'icmp6' instead of '???'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-21 15:00:16 +02:00
Phil Sutter
1267c0b924 ss: Distinguish between IPv4 and IPv6 wildcard sockets
Commit aba9c23a6e ("ss: enclose IPv6 address in brackets") unified
display of wildcard sockets in IPv4 and IPv6 to print the unspecified
address as '*'. Users then complained that they can't distinguish
between address families anymore, so change this again to what Stephen
Hemminger suggested:

| *:80    << both IPV6 and IPV4
| [::]:80 << IPV6_ONLY
| 0.0.0.0:80  << IPV4_ONLY

Note that on older kernels which don't support INET_DIAG_SKV6ONLY
attribute, pure IPv6 sockets will still show as '*'.

Cc: Humberto Alves <hjalves@live.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-21 14:59:29 +02:00
Stephen Hemminger
4b4dde0ae6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 2017-10-18 17:11:50 -07:00
Nikolay Aleksandrov
fdbdd356f0 ip: bridge_slave: add support for per-port group_fwd_mask
This patch adds the iproute2 support for getting and setting the
per-port group_fwd_mask. It also tries to resolve the value into a more
human friendly format by printing the known protocols instead of only
the raw value.
The man page is also updated with the new option.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-10-16 09:26:05 -07:00
Stephen Hemminger
75209f840b Merge branch 'master' into net-next 2017-10-16 09:25:56 -07:00
Petr Vorel
4b73d52f8a color: Rename enum
COLOR_NONE is more descriptive than COLOR_CLEAR.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel
99b89c518e color: Cleanup code to remove "magic" offset + 7
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel
24b058a2a4 color: Fix another ip segfault when using --color switch
Commit 959f1428 ("color: add new COLOR_NONE and disable_color function")
introducing color enum COLOR_NONE, which is not only duplicite of
COLOR_CLEAR, but also caused segfault, when running ip with --color
switch, as 'attr + 8' in color_fprintf() access array item out of
bounds. Thus removing it and restoring "magic" offset + 7.

Reproduce with:
$ ip -c a

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel
e6849a5722 color: Fix ip segfault when using --color switch
Commit d0e72011 ("ip: ipaddress.c: add support for json output")
introduced passing -1 as enum color_attr. This is not only wrong as no
color_attr has value -1, but also causes another segfault in color_fprintf()
on this setup as there is no item with index -1 in array of enum attr_colors[].
Using COLOR_CLEAR is valid option.

Reproduce with:
$ COLORFGBG='0;15' ip -c a

NOTE: COLORFGBG is environmental variable used for defining whether user
has light or dark background.
COLORFGBG="0;15" is used to ask for color set suitable for light background,
COLORFGBG="15;0" is used to ask for color set suitable for dark background.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel
f1241a7e3b tests: Revert back /bin/sh in shebang
This was added by mistake in commit ecd44e68
("tests: Remove bashisms (s/source/.)")

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:22:01 -07:00
Stephen Hemminger
4c6080b5c4 Merge branch 'master' into net-next 2017-10-12 09:06:10 -07:00
Stephen Hemminger
268a9eee98 netem: fix code indentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 18:08:15 -07:00
Stephen Hemminger
4999c57733 Merge branch 'master' into net-next 2017-10-11 11:07:20 -07:00
Ivan Delalande
da9cc6ab90 ss: print MD5 signature keys configured on TCP sockets
These keys are reported by kernel 4.14 and later under the
INET_DIAG_MD5SIG attribute, when INET_DIAG_INFO is requested (ss -i)
and we have CAP_NET_ADMIN. The additional output looks like:

	md5keys:fe80::/64=signing_key,10.1.2.0/24=foobar,::1/128=Test

Signed-off-by: Ivan Delalande <colona@arista.com>
2017-10-11 11:04:47 -07:00
Ivan Delalande
7c72df5a95 utils: add print_escape_buf to format and print arbitrary bytes
Keep it as simple as possible for now: just escape anything that is not
isprint-able, is among the "escape" parameter or '\' as an octal escape
sequence. This should be pretty easy to extend if any other user needs
something more complex in the future.

Signed-off-by: Ivan Delalande <colona@arista.com>
2017-10-11 11:04:47 -07:00
Baruch Siach
4f6b73380d lib: fix multiple strlcpy definition
Some C libraries, like uClibc and musl, provide BSD compatible
strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
and strlcat when the C library provides them.

This fixes the following static link error with uClibc-ng:

.../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
collect2: error: ld returned 1 exit status

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
2017-10-11 11:02:13 -07:00
Petr Vorel
ecd44e6805 tests: Remove bashisms (s/source/.)
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-11 10:59:50 -07:00
Roopa Prabhu
41973a47dd iplink: new option to set neigh suppression on a bridge port
neigh suppression can be used to suppress arp and nd flood
to bridge ports. It maps to the recently added
kernel support for bridge port flag IFLA_BRPORT_NEIGH_SUPPRESS.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-10-11 10:56:36 -07:00
Yotam Gigi
2055bf15f1 ip: mroute: Print offload indication
Since kernel net-next commit c7c0bbeae950 ("net: ipmr: Add MFC offload
indication") the kernel indicates on an MFC entry whether it was offloaded
using the RTNH_F_OFFLOAD flag. Update the "ip mroute show" command to
indicate when a route is offloaded, similarly to the "ip route show"
command.

Example output:
$ ip mroute
(0.0.0.0, 239.255.0.1)      Iif: sw1p7  Oifs: t_br0 State: resolved offload
(192.168.1.1, 239.255.0.1)  Iif: sw1p7  Oifs: sw1p4 State: resolved offload

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-10-11 10:54:27 -07:00
Stefan Hajnoczi
c759116a0b ss: add AF_VSOCK support
The AF_VSOCK address family is a host<->guest communications channel
supported by VMware, KVM, and Hyper-V.  Initial VMware support was
released in Linux 3.9 in 2013 and transports for other hypervisors were
added later.

AF_VSOCK addresses are <u32 cid, u32 port> tuples.  The 32-bit cid
integer is comparable to an IP address.  AF_VSOCK ports work like
TCP/UDP ports.

Both SOCK_STREAM and SOCK_DGRAM socket types are available.

This patch adds AF_VSOCK support to ss(8) so that sockets can be
observed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-11 10:51:03 -07:00
Stefan Hajnoczi
b338a3e7e7 ss: allow AF_FAMILY constants >32
Linux has more than 32 address families defined in <bits/socket.h>.  Use
a 64-bit type so all of them can be represented in the filter->families
bitmask.

It's easy to introduce bugs when using (1 << AF_FAMILY) because the
value is 32-bit.  This can produce incorrect results from bitmask
operations so introduce the FAMILY_MASK() macro to eliminate these bugs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-11 10:50:20 -07:00
Stephen Hemminger
e9b0d82dfa uapi: add include linux/vm_sockets_diag.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:49:25 -07:00
Stephen Hemminger
07682b88d8 Merge branch 'master' into net-next 2017-10-11 10:47:55 -07:00
Stephen Hemminger
237a52731b rdma: move headers to uapi
And update with version from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:47:28 -07:00
Stephen Hemminger
f53da99ad7 update uapi headers from 4.14-rc4 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:43:38 -07:00
Stephen Hemminger
92503441cc Merge branch 'master' into net-next 2017-10-11 10:43:13 -07:00
Lorenzo Colitti
596b1c94aa iproute: build more easily on Android
iproute2 contains a bunch of kernel headers, including uapi ones.
Android's libc uses uapi headers almost directly, and uses a
script to fix kernel types that don't match what userspace
expects.

For example: https://issuetracker.google.com/36987220 reports
that our struct ip_mreq_source contains "__be32 imr_multiaddr"
rather than "struct in_addr imr_multiaddr". The script addresses
this by replacing the uapi struct definition with a #include
<bits/ip_mreq.h> which contains the traditional userspace
definition.

Unfortunately, when we compile iproute2, this definition
conflicts with the one in iproute2's linux/in.h.

Historically we've just solved this problem by running "git rm"
on all the iproute2 include/linux headers that break Android's
libc.  However, deleting the files in this way makes it harder to
keep up with upstream, because every upstream change to
an include file causes a merge conflict with the delete.

This patch fixes the problem by moving the iproute2 linux headers
from include/linux to include/uapi/linux.

Tested: compiles on ubuntu trusty (glibc)

Signed-off-by: Elliott Hughes <enh@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2017-10-11 10:35:45 -07:00
Stephen Hemminger
b0af8fc1aa tipc: don't need custom CFLAGS
Since libmnl CFLAGS are now handled by config.mk

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:35:38 -07:00
Stephen Hemminger
60509b997d Merge branch 'master' into net-next 2017-10-02 08:04:13 -07:00
Stephen Hemminger
1db903def7 update headers from net-next rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-02 08:03:45 -07:00
Phil Sutter
625df645b7 Check user supplied interface name lengths
The original problem was that something like:

| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);

might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.

Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Phil Sutter
ee474849c8 tc: flower: No need to cache indev arg
Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Phil Sutter
26111ab1db ip{6, }tunnel: Avoid copying user-supplied interface name around
In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq.  Avoid this by just caching the argv
pointer value until the later lookup/strcpy.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Michal Kubecek
4c0939a29e ip xfrm: use correct key length for netlink message
When SA is added manually using "ip xfrm state add", xfrm_state_modify()
uses alg_key_len field of struct xfrm_algo for the length of key passed to
kernel in the netlink message. However alg_key_len is bit length of the key
while we need byte length here. This is usually harmless as kernel ignores
the excess data but when the bit length of the key exceeds 512
(XFRM_ALGO_KEY_BUF_SIZE), it can result in buffer overflow.

We can simply divide by 8 here as the only place setting alg_key_len is in
xfrm_algo_parse() where it is always set to a multiple of 8 (and there are
already multiple places using "algo->alg_key_len / 8").

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-10-01 13:44:38 -07:00
Yulia Kartseva
73451259da tc: fix ipv6 filter selector attribute for some prefix lengths
Wrong TCA_U32_SEL attribute packing if prefixLen AND 0x1f equals 0x1f.
These are  /31, /63, /95 and /127 prefix lengths.

Example:
ip6 dst face:b00f::/31
filter parent b: protocol ipv6 pref 2307 u32
filter parent b: protocol ipv6 pref 2307 u32 fh 800: ht divisor 1
filter parent b: protocol ipv6 pref 2307 u32 fh 800::800 order 2048
key ht 800 bkt 0
  match faceb00f/ffffffff at 24

v2: previous patch was made with a wrong repo

Signed-off-by: Yulia Kartseva <hex@fb.com>
2017-10-01 13:41:29 -07:00
Stephen Hemminger
f412357017 Merge branch 'master' into net-next 2017-09-29 12:03:16 -07:00
Phil Sutter
e4139268ba ip-route: Fix for listing routes with RTAX_LOCK attribute
This fixes a corner-case for routes with a certain metric locked to
zero:

| ip route add 192.168.7.0/24 dev eth0 window 0
| ip route add 192.168.7.0/24 dev eth0 window lock 0

Since the kernel doesn't dump the attribute if it is zero, both routes
added above would appear as if they were equal although they are not.

Fix this by taking mxlock value for the given metric into account before
skipping it if it is not present.

Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-29 12:02:09 -07:00
Stephen Hemminger
ee7bfb52a7 Merge branch 'master' into net-next 2017-09-29 10:51:25 -07:00