This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.
With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.
With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.
We need to free answer after using.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
With commit 72b365e8e0 ("libnetlink: Double the dump buffer size")
we doubled the buffer size to support more VFs. But the VFs number is
increasing all the time. Some customers even use more than 200 VFs now.
We could not double it everytime when the buffer is not enough. Let's just
not hard code the buffer size and malloc the correct number when running.
Introduce function rtnl_recvmsg() to always return a newly allocated buffer.
The caller need to free it after using.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Sample use case:
... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress
... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe
sudo $TC -s actions ls action police
action order 0: police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
ref 1 bind 0 installed 41 sec used 41 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12
... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12
Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8
... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw (rule hit 20000 success 10000)
match 7f000008/ffffffff at 16 (success 10000 )
action order 1: police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
ref 2 bind 1 installed 198 sec used 2 sec
Action statistics:
Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
backlog 0b 0p requeues 0
action order 2: skbedit mark 11 pass
index 11 ref 2 bind 1 installed 127 sec used 2 sec
Action statistics:
Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
action order 3: skbedit mark 12 pass
index 12 ref 2 bind 1 installed 127 sec used 2 sec
Action statistics:
Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add neigh_suppress to the type help and document it in ip-link's man page.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit 530903dd90 ("ip: fix igmp parsing when iface is long") uses
variable len to keep trailing colon from interface name comparison. This
variable is local to loop body but we set it in one pass and use it in
following one(s) so that we are actually using (pseudo)random length for
comparison. This became apparent since commit b48a1161f5 ("ipmaddr: Avoid
accessing uninitialized data") always initializes len to zero so that the
name comparison is always true. As a result, "ip maddr show dev eth0" shows
IPv4 multicast addresses for all interfaces.
Instead of keeping the length, let's simply replace the trailing colon with
a null byte. The bonus is that we get correct interface name in ma.name.
Fixes: 530903dd90 ("ip: fix igmp parsing when iface is long")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Petr Vorel <pvorel@suse.cz>
Commit aba9c23a6e ("ss: enclose IPv6 address in brackets") unified
display of wildcard sockets in IPv4 and IPv6 to print the unspecified
address as '*'. Users then complained that they can't distinguish
between address families anymore, so change this again to what Stephen
Hemminger suggested:
| *:80 << both IPV6 and IPV4
| [::]:80 << IPV6_ONLY
| 0.0.0.0:80 << IPV4_ONLY
Note that on older kernels which don't support INET_DIAG_SKV6ONLY
attribute, pure IPv6 sockets will still show as '*'.
Cc: Humberto Alves <hjalves@live.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
This patch adds the iproute2 support for getting and setting the
per-port group_fwd_mask. It also tries to resolve the value into a more
human friendly format by printing the known protocols instead of only
the raw value.
The man page is also updated with the new option.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Commit 959f1428 ("color: add new COLOR_NONE and disable_color function")
introducing color enum COLOR_NONE, which is not only duplicite of
COLOR_CLEAR, but also caused segfault, when running ip with --color
switch, as 'attr + 8' in color_fprintf() access array item out of
bounds. Thus removing it and restoring "magic" offset + 7.
Reproduce with:
$ ip -c a
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Commit d0e72011 ("ip: ipaddress.c: add support for json output")
introduced passing -1 as enum color_attr. This is not only wrong as no
color_attr has value -1, but also causes another segfault in color_fprintf()
on this setup as there is no item with index -1 in array of enum attr_colors[].
Using COLOR_CLEAR is valid option.
Reproduce with:
$ COLORFGBG='0;15' ip -c a
NOTE: COLORFGBG is environmental variable used for defining whether user
has light or dark background.
COLORFGBG="0;15" is used to ask for color set suitable for light background,
COLORFGBG="15;0" is used to ask for color set suitable for dark background.
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
These keys are reported by kernel 4.14 and later under the
INET_DIAG_MD5SIG attribute, when INET_DIAG_INFO is requested (ss -i)
and we have CAP_NET_ADMIN. The additional output looks like:
md5keys:fe80::/64=signing_key,10.1.2.0/24=foobar,::1/128=Test
Signed-off-by: Ivan Delalande <colona@arista.com>
Keep it as simple as possible for now: just escape anything that is not
isprint-able, is among the "escape" parameter or '\' as an octal escape
sequence. This should be pretty easy to extend if any other user needs
something more complex in the future.
Signed-off-by: Ivan Delalande <colona@arista.com>
Some C libraries, like uClibc and musl, provide BSD compatible
strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
and strlcat when the C library provides them.
This fixes the following static link error with uClibc-ng:
.../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
collect2: error: ld returned 1 exit status
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
neigh suppression can be used to suppress arp and nd flood
to bridge ports. It maps to the recently added
kernel support for bridge port flag IFLA_BRPORT_NEIGH_SUPPRESS.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Since kernel net-next commit c7c0bbeae950 ("net: ipmr: Add MFC offload
indication") the kernel indicates on an MFC entry whether it was offloaded
using the RTNH_F_OFFLOAD flag. Update the "ip mroute show" command to
indicate when a route is offloaded, similarly to the "ip route show"
command.
Example output:
$ ip mroute
(0.0.0.0, 239.255.0.1) Iif: sw1p7 Oifs: t_br0 State: resolved offload
(192.168.1.1, 239.255.0.1) Iif: sw1p7 Oifs: sw1p4 State: resolved offload
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
The AF_VSOCK address family is a host<->guest communications channel
supported by VMware, KVM, and Hyper-V. Initial VMware support was
released in Linux 3.9 in 2013 and transports for other hypervisors were
added later.
AF_VSOCK addresses are <u32 cid, u32 port> tuples. The 32-bit cid
integer is comparable to an IP address. AF_VSOCK ports work like
TCP/UDP ports.
Both SOCK_STREAM and SOCK_DGRAM socket types are available.
This patch adds AF_VSOCK support to ss(8) so that sockets can be
observed.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Linux has more than 32 address families defined in <bits/socket.h>. Use
a 64-bit type so all of them can be represented in the filter->families
bitmask.
It's easy to introduce bugs when using (1 << AF_FAMILY) because the
value is 32-bit. This can produce incorrect results from bitmask
operations so introduce the FAMILY_MASK() macro to eliminate these bugs.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
iproute2 contains a bunch of kernel headers, including uapi ones.
Android's libc uses uapi headers almost directly, and uses a
script to fix kernel types that don't match what userspace
expects.
For example: https://issuetracker.google.com/36987220 reports
that our struct ip_mreq_source contains "__be32 imr_multiaddr"
rather than "struct in_addr imr_multiaddr". The script addresses
this by replacing the uapi struct definition with a #include
<bits/ip_mreq.h> which contains the traditional userspace
definition.
Unfortunately, when we compile iproute2, this definition
conflicts with the one in iproute2's linux/in.h.
Historically we've just solved this problem by running "git rm"
on all the iproute2 include/linux headers that break Android's
libc. However, deleting the files in this way makes it harder to
keep up with upstream, because every upstream change to
an include file causes a merge conflict with the delete.
This patch fixes the problem by moving the iproute2 linux headers
from include/linux to include/uapi/linux.
Tested: compiles on ubuntu trusty (glibc)
Signed-off-by: Elliott Hughes <enh@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
The original problem was that something like:
| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);
might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.
Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.
Signed-off-by: Phil Sutter <phil@nwl.cc>
In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq. Avoid this by just caching the argv
pointer value until the later lookup/strcpy.
Signed-off-by: Phil Sutter <phil@nwl.cc>
When SA is added manually using "ip xfrm state add", xfrm_state_modify()
uses alg_key_len field of struct xfrm_algo for the length of key passed to
kernel in the netlink message. However alg_key_len is bit length of the key
while we need byte length here. This is usually harmless as kernel ignores
the excess data but when the bit length of the key exceeds 512
(XFRM_ALGO_KEY_BUF_SIZE), it can result in buffer overflow.
We can simply divide by 8 here as the only place setting alg_key_len is in
xfrm_algo_parse() where it is always set to a multiple of 8 (and there are
already multiple places using "algo->alg_key_len / 8").
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
This fixes a corner-case for routes with a certain metric locked to
zero:
| ip route add 192.168.7.0/24 dev eth0 window 0
| ip route add 192.168.7.0/24 dev eth0 window lock 0
Since the kernel doesn't dump the attribute if it is zero, both routes
added above would appear as if they were equal although they are not.
Fix this by taking mxlock value for the given metric into account before
skipping it if it is not present.
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>