Commit Graph

78 Commits

Author SHA1 Message Date
David Ahern
3dec72672f libnetlink: __rtnl_talk_iov should only loop max iovlen times
William reported ip hanging and bisected to a recent commit for batching
allowing more than 1 command to be sent per message. The loop over
recvmsg should never cycle more than iovlen times -- 1 response for
each command in the message.

Fixes: 72a2ff3916 ("lib/libnetlink: Add a new function rtnl_talk_iov")
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-02 13:30:34 -08:00
Arkadi Sharshevsky
049c58539f devlink: mnlg: Add support for extended ack
Add support for extended ack.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Chris Mi
72a2ff3916 lib/libnetlink: Add a new function rtnl_talk_iov
rtnl_talk can only send a single message to kernel. Add a new function
rtnl_talk_iov that can send multiple messages to kernel.
rtnl_talk_iov takes struct iovec * and iovlen as arguments.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-14 09:03:33 -08:00
Stephen Hemminger
913352fe54 drop unneeded include of syslog.h
Only arpd uses syslog

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 16:22:36 -08:00
David Ahern
844c37b423 libnetlink: Handle extack messages for non-error case
Kernel can now return non-fatal error messages in extack facility.
Update iproute2 to dump to use if present.
- rename nl_dump_ext_err to nl_dump_ext_ack
- rename errmsg to msg
- add call to nl_dump_ext_ack in rtnl_dump_done and __rtnl_talk for
  non-error path

Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2017-11-09 09:46:50 +09:00
Hangbin Liu
86bf43c7c2 lib/libnetlink: update rtnl_talk to support malloc buff at run time
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.

With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.

With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.

We need to free answer after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
Hangbin Liu
2d34851cd3 lib/libnetlink: re malloc buff if size is not enough
With commit 72b365e8e0 ("libnetlink: Double the dump buffer size")
we doubled the buffer size to support more VFs. But the VFs number is
increasing all the time. Some customers even use more than 200 VFs now.

We could not double it everytime when the buffer is not enough. Let's just
not hard code the buffer size and malloc the correct number when running.

Introduce function rtnl_recvmsg() to always return a newly allocated buffer.
The caller need to free it after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
Phil Sutter
893deac4c4 lib/libnetlink: Don't pass NULL parameter to memcpy()
Both addattr_l() and rta_addattr_l() may be called with NULL data
pointer and 0 alen parameters. Avoid calling memcpy() in that case.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Stephen Hemminger
0efa625765 libnetlink: drop unused parameter to rtnl_dump_done
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:02:48 -07:00
David Ahern
e5fa0e6fe7 libnetlink: Fix extack attribute parsing
Initialize tb in nl_dump_ext_err since not all attributes will be
sent in the messages.

Add error checking on mnl_attr_parse and print messages on the off
chance the ext ack attributes fail to validate.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-08-18 08:47:34 -07:00
David Ahern
fb6cb30774 lib: Dump ext-ack string by default
In time, errfn can be implemented for link, route, etc commands to
give a much more detailed response (e.g., point to the attribute
that failed). Doing so is much more complicated to process the
message and convert attribute ids to names.

In any case the error string returned by the kernel should be dumped
to the user, so make that happen now.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-08-09 09:14:01 -07:00
Stephen Hemminger
7d23fa5591 lib: fix extended ack with and without libmnl
The code was always building without libmnl support, so it was
doing nothing.

Fixes: b6432e68ac ("iproute: Add support for extended ack to rtnl_talk")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-07 12:01:49 -07:00
Stephen Hemminger
b6432e68ac iproute: Add support for extended ack to rtnl_talk
Add support for extended ack error reporting via libmnl.
Add a new function rtnl_talk_extack that takes a callback as an input
arg. If a netlink response contains extack attributes, the callback is
is invoked with the the err string, offset in the message and a pointer
to the message returned by the kernel.

If iproute2 is built without libmnl, it will still work but
extended error reports from kernel will not be available.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-04 09:54:00 -07:00
David Ahern
05a14fc121 netlink: Change rtnl_dump_done to always show error
The original code which became rtnl_dump_done only shows netlink errors
if the protocol is NETLINK_SOCK_DIAG, but netlink dumps always appends
the length which contains any error encountered during the dump. Update
rtnl_dump_done to always show the error if there is one.

As an *example* without this patch, dumping a route object that exceeds
the internal buffer size terminates with no message to the user -- the
dump just ends because the NLMSG_DONE attribute was received. With this
patch the user at least gets a message that the dump was aborted.

$ ip ro ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
10.10.0.0/16 dev veth1 proto kernel scope link src 10.10.0.1
172.16.1.0/24 dev br0.11 proto kernel scope link src 172.16.1.1
Error: Buffer too small for object
Dump terminated

The point of this patch is to notify the user of a failure versus
silently exiting on a partial dump. Because the NLMSG_DONE attribute
was received, the entire dump needs to be restarted to use a larger
buffer for EMSGSIZE errors. That could be done automatically but it
has other user impacts (e.g., duplicate output if the dump is
restarted) and should be the subject of a different patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:32:38 -07:00
David Ahern
3ad6d17638 netlink: Add flag to suppress print of nlmsg error
Allow callers of the dump API to handle nlmsg errors (e.g., an
unsupported feature). Setting RTNL_HANDLE_F_SUPPRESS_NLERR in the
rtnl_handle avoids unnecessary messages to the users in some case.
For example,

  RTNETLINK answers: Operation not supported

when probing for support of a new feature.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
Stephen Hemminger
892a25e286 libnetlink: break up dump function
Indentation is deep here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-13 10:41:29 -08:00
David Ahern
463d9efaa2 libnetlink: Add variant of rtnl_talk that does not display RTNETLINK answers error
iplink_vrf has 2 functions used to validate a user given device name is
a VRF device and to return the table id. If the user string is not a
device name ip commands with a vrf keyword show a confusing error
message: "RTNETLINK answers: No such device".

Add a variant of rtnl_talk that does not display the "RTNETLINK answers"
message and update iplink_vrf to use it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
Cyrill Gorcunov
9f66764e30 libnetlink: Add test for error code returned from netlink reply
In case if some diag module is not present in the system,
say the kernel is not modern enough, we simply skip the
error code reported. Instead we should check for data
length in NLMSG_DONE and process unsupported case.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2016-12-01 10:55:56 -08:00
Stephen Hemminger
2c500a4dc2 libnetlink: style cleanups
Follow kernel style related cleanups:
 * break long lines
 * remove unnecessary void * cast
2016-11-29 13:15:08 -08:00
Zhang Shengju
1b109a30bf libnetlink: reduce size of message sent to kernel
Fixes commit 246f57c4086d99fa ("ip link: Add support for kernel
side filtering").

This patch reduce the size of message sent to kernel space. Before this
patch, for command: 'ip link show', we will sent 1056 bytes. With this
patch, we only need to send 40 bytes.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-11-29 13:03:00 -08:00
Nikolay Aleksandrov
7abf5de677 bridge: vlan: add support to display per-vlan statistics
This patch adds support for the stats argument to the bridge
vlan command which will display the per-vlan statistics and the device
each vlan belongs to with its flags. The supported command filtering
options are dev and vid. Also the man page is updated to explain the new
option.
The patch uses the new RTM_GETSTATS interface with a filter_mask to dump
all bridges and ports vlans. Later we can add support for using the
per-device dump and filter it in the kernel instead.

Example:
$ bridge -s vlan show
port             vlan id
br0               1 Egress Untagged
                    RX: 2536 bytes 20 packets
                    TX: 2536 bytes 20 packets
                  101
                    RX: 43158 bytes 50 packets
                    TX: 43158 bytes 50 packets
eth1              1 Egress Untagged
                    RX: 2536 bytes 20 packets
                    TX: 2536 bytes 20 packets
                  100
                    RX: 0 bytes 0 packets
                    TX: 0 bytes 0 packets
                  101
                    RX: 43158 bytes 50 packets
                    TX: 43158 bytes 50 packets
                  102
                    RX: 16897 bytes 93 packets
                    TX: 0 bytes 0 packets

The format is the same as bridge vlan show but with stats, even though
under the hood the calls done to the kernel are different.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-08-29 10:58:40 -07:00
Phil Sutter
d17b136f7d Use C99 style initializers everywhere
This big patch was compiled by vimgrepping for memset calls and changing
to C99 initializer if applicable. One notable exception is the
initialization of union bpf_attr in tc/tc_bpf.c: changing it would break
for older gcc versions (at least <=3.4.6).

Calls to memset for struct rtattr pointer fields for parse_rtattr*()
were just dropped since they are not needed.

The changes here allowed the compiler to discover some unused variables,
so get rid of them, too.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 12:05:24 -07:00
David Ahern
b0a4ce620e ip link: Add support for kernel side filtering
Kernel gained support for filtering link dumps with commit dc599f76c22b
("net: Add support for filtering link dump by master device and kind").
Add support to ip link command. If a user passes master device or
kind to ip link command they are added to the link dump request message.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-05-18 11:52:14 -07:00
Stephen Hemminger
e9e9365b56 scrub out whitespace issues
Run script that removes trailing whitespace everywhere.
2016-03-27 10:50:14 -07:00
Phil Sutter
72b365e8e0 libnetlink: Double the dump buffer size
There have been reports about 'ip addr' printing "Message truncated" on
systems with large numbers of VFs. Although I haven't been able to get
my hands on hardware suitable to reproduce this, increasing the dump
buffer has been reported to resolve the issue. For want of a better
idea, just double the buffer size to 32k.

Feels like this opportunistic buffer size selection is rather
workarounding a design flaw in libnetlink or maybe even the netlink
protocol itself.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:51:18 -08:00
Lorenzo Colitti
57fdf2d4d9 libnetlink: don't print NETLINK_SOCK_DIAG errors in rtnl_talk
This change is a no-op, as currently no code uses rtnl_talk on
NETLINK_SOCK_DIAG_BY_FAMILY sockets. It is needed to suppress
spurious errors when using SOCK_DESTROY via rtnl_talk.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2016-01-18 11:47:03 -08:00
Nicolas Dichtel
ed108cfc02 libnetlink: don't confuse variables in rtnl_talk()
There is two variables named 'len' in rtnl_talk. In fact, commit
c079e121a7 didn't work. For example, it was possible to trigger
a seg fault with this command:
$ ip link set gre2 type ip6gre hoplimit 32

Let's rename the argument len to maxlen.

Fixes: c079e121a7 ("libnetlink: add size argument to rtnl_talk")
Reported-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-12-10 08:45:21 -08:00
Phil Sutter
8e72880f6b libnetlink: introduce nc_flags
Allow for a filter to ignore certain nlmsg_flags.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:47:29 -08:00
Stephen Hemminger
c6646c1ea5 Merge branch 'master' into net-next 2015-10-16 16:03:32 -07:00
Roopa Prabhu
303cc9cbee libnetlink: introduce rta_nest and u8, u16, u64 helpers for nesting within rtattr
This patch introduces two new api's rta_nest and rta_nest_end to
nest attributes inside a rta attribute represented by 'struct rtattr'
as required to construct a nexthop. Also adds rta_addattr* variants
for u8, u16 and u64 as needed to support encapsulation.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
2015-10-16 16:00:47 -07:00
David Ahern
0d238ca2b8 ip neigh: Add support for filtering dumps by master device
Add support for filtering neighbor dumps by master device. Kernel side
support provided by commit 21fdd092acc7. Since the feature is not
available in older kernels the user is given a warning message if the
kernel does not support the request.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2015-10-12 09:39:37 -07:00
Stephen Hemminger
03371c7d98 Merge branch 'master' into net-next
Conflicts:
	include/linux/tcp.h
	lib/libnetlink.c
2015-05-28 09:18:01 -07:00
Stephen Hemminger
c079e121a7 libnetlink: add size argument to rtnl_talk
There have been several instances where response from kernel
has overrun the stack buffer from the caller. Avoid future problems
by passing a size argument.

Also drop the unused peer and group arguments to rtnl_talk.
2015-05-27 13:00:21 -07:00
Nicolas Dichtel
449b824ad1 ipmonitor: allows to monitor in several netns
With this patch, it's now possible to listen in all netns that have an nsid
assigned into the netns where the socket is opened.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-05-21 15:28:56 -07:00
Nicolas Dichtel
0628cddd9d libnetlink: introduce rtnl_listen_filter_t
There is no functional change with this commit. It only prepares the next one.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-05-21 15:28:56 -07:00
Jiri Pirko
decbb4378c libnetlink: add parse_rtattr_one_nested helper
Sometimes, it is more convenient to get only one specific nested attribute by
type. For example for IFLA_AF_SPEC where type is address family (AF_INET6).
So add this helper for this purpose.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-01-07 15:11:35 -08:00
Vadim Kochan
486ccd99a0 ss: Use rtnl_dump_filter for inet_show_netlink
Just another refactoring for ss to use rtnl API from lib

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-27 10:21:10 -08:00
vadimk
8a4025f6a4 ss: Use rtnl_dump_filter in handle_netlink_request
Replaced handling netlink messages by rtnl_dump_filter
from lib/libnetlink.c, also:

    - removed unused dump_fp arg;
    - added MAGIC_SEQ #define for 123456 seq id;
    - silently exit if ENOENT errno is caused for NETLINK_SOCK_DIAG proto
        in lib/libnetlink.c: rtnl_duml_filter_l(...) function. This fix
        was added in a3fd8e58c1 by Eric
        for misc/ss.c

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-20 12:17:02 -08:00
Eric Dumazet
e557212eda netlink: extend buffers to 16K
Starting from linux-3.15 (commit 9063e21fb026, "netlink: autosize skb
lengths"), kernel is able to send up to 16K in netlink replies.

This change enables iproute2 commands to get bigger chunks,
without breaking compatibility with old kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2014-10-29 22:43:04 -07:00
Andrey Vagin
bcb9d40319 ip: set the close-on-exec flag for descriptors
Otherwise a program executed by "ip netns exec" has two extra
descriptors.

$ ip netns exec test /bin/bash
$ lsof -p $$
...
bash    817 root    0u   CHR  136,0       0t0          3 /dev/pts/0
bash    817 root    1u   CHR  136,0       0t0          3 /dev/pts/0
bash    817 root    2u   CHR  136,0       0t0          3 /dev/pts/0
bash    817 root    3u  sock    0,6       0t0      13386 protocol: NETLINK
bash    817 root    4r   REG    0,3         0 4026532155 net
bash    817 root  255u   CHR  136,0       0t0          3 /dev/pts/0

Cc: Stephen Hemminger <stephen@networkplumber.org>
Reported-by: Dilip Daya <dilip.daya@hp.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2013-06-04 09:11:06 -07:00
Alexander Duyck
63338dca45 libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter
This change corrects a kernel incompatibility that was resulting in the
ext_filter_mask not being correctly discovered by the kernel as it is buried
somewhere in the ifinfomsg.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
2013-04-26 16:40:30 -07:00
Nicolas Dichtel
16f02e145e libnetlink: check flag NLM_F_DUMP_INTR during dumps
When this flag is set, it means that dump was interrupted and result may be
inconsistent.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-03-28 14:44:41 -07:00
Vlad Yasevich
b1b7ce0f0d bridge: Add support for printing bridge port attributes
Output new nested bridge port attributes.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2013-03-16 10:02:18 -07:00
Vlad Yasevich
9eff0e5cc4 bridge: Add vlan configuration support
Recent kernel patches added support for VLAN filtering on the bridge.
This functionality allows one to turn a basic bridge into a VLAN bridge,
where VLANs dicatate packet forwarding and header transformation.

To configure the VLANs on the bridge and its ports a new command is
added to the 'bridge' utility.

   # bridge vlan add dev eth0 vid 10 pvid untagged brdev
   # bridge vlan add
   # bridge vlan delete dev eth0 vid 10
   # bridge vlan show

This command supports the following flags:
   master - peform the operation on the software bridge device.  This is
	    the default behavior.
   self  -  perform the operation on the hardware associated with the port.
            This flag is required when the device is the bridge device and
	    the configuration is desired on the bridge device itself (not
	    one of the ports).
   pvid  -  Set the PVID (port vlan id) for a given port.  Any untagged
            frames arriving on the port will be assigned to this vlan.
   untagged - Sets the egress policy of for a given vlan.  Default port
            egress policy is tagged.  Set this flag if you wish traffic
            associated with this VLAN to exit the port untagged.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2013-03-06 11:03:08 -08:00
Lutz Jaenicke
257422f77f rtnl_wilddump_request: fix alignment issue for embedded platforms
Platforms have different alignment requirements which need to be
fulfilled by the compiler. If the structure elements are already
4 byte (NLMGS_ALIGNTO) aligned by the compiler adding an explicit
padding element (align_rta) is not allowed.
Use __attribute__ ((aligned (NLMSG_ALIGNTO))) in order to achieve
the required alignment.
Experienced on ARM (xscale) with symptom
  netlink: 12 bytes leftover after parsing attributes

Tested on:
  ARM      (32bit Big Endian)
  PowerPC  (32bit Big Endian)
  x86_64   (64bit Little Endian)
Each with different aligment requirments.

Signed-off-by: Lutz Jaenicke <ljaenicke@innominate.com>
2013-02-19 07:45:59 -08:00
Pavel Emelyanov
b8cf1e9ae3 iproute: Fix errno propagation from rtnl_talk
Callers of rtnl_talk check errno value for their needs. In particular, the addrs
and routes restoring code validly reports success if the EEXISTS is in there.

However, the errno value can be sometimes screwed up by the perror call. Thus
we should only set it _after_ the message was emitted.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-20 12:54:48 -07:00
Rose, Gregory V
bd886ebb1f iproute2: Add netlink attribute to filter dump requests
Add a new netlink attribute type to the dump request to allow
filtering of the information returned for the respective matching
interfaces.  At this time the only filter defined is to request
virtual function (VF) device info for interfaces that attached VFs.

It will also be possible to extend the request with other yet to be
defined netlink attributes in the future.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
2012-04-12 09:36:30 -07:00
Masatake YAMATO
aa38c3eefa using NLM_F_DUMP flag constant in libnetlink.c
This is trivial patch for libnetlink.c in iproute2.

In iproute2/include/linux/netlink.h NLM_F_DUMP is defines as:

   #define NLM_F_DUMP	(NLM_F_ROOT|NLM_F_MATCH)

It is not used in libnetlink.c. If used, the code becomes a bit easier
to read.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2012-01-19 14:16:12 -08:00
Stephen Hemminger
cd70f3f522 libnetlink: remove unused junk callback
Both rtnl_talk and rtnl_dump had a callback for handling portions
of netlink message that do not match the correct pid or seq.
But this callback was never used by any part of iproute2 so remove
it.
2011-12-28 10:37:12 -08:00
Stephen Hemminger
2aa3dd29a7 libnetlink: add more attribute functions
New functions to handle u8, u16, u32, u64 and string attribute types.
Use common code for all attribute wrappers.
2011-12-23 10:43:54 -08:00