In time, errfn can be implemented for link, route, etc commands to
give a much more detailed response (e.g., point to the attribute
that failed). Doing so is much more complicated to process the
message and convert attribute ids to names.
In any case the error string returned by the kernel should be dumped
to the user, so make that happen now.
Signed-off-by: David Ahern <dsahern@gmail.com>
The code was always building without libmnl support, so it was
doing nothing.
Fixes: b6432e68ac ("iproute: Add support for extended ack to rtnl_talk")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support for extended ack error reporting via libmnl.
Add a new function rtnl_talk_extack that takes a callback as an input
arg. If a netlink response contains extack attributes, the callback is
is invoked with the the err string, offset in the message and a pointer
to the message returned by the kernel.
If iproute2 is built without libmnl, it will still work but
extended error reports from kernel will not be available.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The original code which became rtnl_dump_done only shows netlink errors
if the protocol is NETLINK_SOCK_DIAG, but netlink dumps always appends
the length which contains any error encountered during the dump. Update
rtnl_dump_done to always show the error if there is one.
As an *example* without this patch, dumping a route object that exceeds
the internal buffer size terminates with no message to the user -- the
dump just ends because the NLMSG_DONE attribute was received. With this
patch the user at least gets a message that the dump was aborted.
$ ip ro ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
10.10.0.0/16 dev veth1 proto kernel scope link src 10.10.0.1
172.16.1.0/24 dev br0.11 proto kernel scope link src 172.16.1.1
Error: Buffer too small for object
Dump terminated
The point of this patch is to notify the user of a failure versus
silently exiting on a partial dump. Because the NLMSG_DONE attribute
was received, the entire dump needs to be restarted to use a larger
buffer for EMSGSIZE errors. That could be done automatically but it
has other user impacts (e.g., duplicate output if the dump is
restarted) and should be the subject of a different patch.
Signed-off-by: David Ahern <dsahern@gmail.com>
Allow callers of the dump API to handle nlmsg errors (e.g., an
unsupported feature). Setting RTNL_HANDLE_F_SUPPRESS_NLERR in the
rtnl_handle avoids unnecessary messages to the users in some case.
For example,
RTNETLINK answers: Operation not supported
when probing for support of a new feature.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
iplink_vrf has 2 functions used to validate a user given device name is
a VRF device and to return the table id. If the user string is not a
device name ip commands with a vrf keyword show a confusing error
message: "RTNETLINK answers: No such device".
Add a variant of rtnl_talk that does not display the "RTNETLINK answers"
message and update iplink_vrf to use it.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
In case if some diag module is not present in the system,
say the kernel is not modern enough, we simply skip the
error code reported. Instead we should check for data
length in NLMSG_DONE and process unsupported case.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Fixes commit 246f57c4086d99fa ("ip link: Add support for kernel
side filtering").
This patch reduce the size of message sent to kernel space. Before this
patch, for command: 'ip link show', we will sent 1056 bytes. With this
patch, we only need to send 40 bytes.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
This patch adds support for the stats argument to the bridge
vlan command which will display the per-vlan statistics and the device
each vlan belongs to with its flags. The supported command filtering
options are dev and vid. Also the man page is updated to explain the new
option.
The patch uses the new RTM_GETSTATS interface with a filter_mask to dump
all bridges and ports vlans. Later we can add support for using the
per-device dump and filter it in the kernel instead.
Example:
$ bridge -s vlan show
port vlan id
br0 1 Egress Untagged
RX: 2536 bytes 20 packets
TX: 2536 bytes 20 packets
101
RX: 43158 bytes 50 packets
TX: 43158 bytes 50 packets
eth1 1 Egress Untagged
RX: 2536 bytes 20 packets
TX: 2536 bytes 20 packets
100
RX: 0 bytes 0 packets
TX: 0 bytes 0 packets
101
RX: 43158 bytes 50 packets
TX: 43158 bytes 50 packets
102
RX: 16897 bytes 93 packets
TX: 0 bytes 0 packets
The format is the same as bridge vlan show but with stats, even though
under the hood the calls done to the kernel are different.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This big patch was compiled by vimgrepping for memset calls and changing
to C99 initializer if applicable. One notable exception is the
initialization of union bpf_attr in tc/tc_bpf.c: changing it would break
for older gcc versions (at least <=3.4.6).
Calls to memset for struct rtattr pointer fields for parse_rtattr*()
were just dropped since they are not needed.
The changes here allowed the compiler to discover some unused variables,
so get rid of them, too.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Kernel gained support for filtering link dumps with commit dc599f76c22b
("net: Add support for filtering link dump by master device and kind").
Add support to ip link command. If a user passes master device or
kind to ip link command they are added to the link dump request message.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
There have been reports about 'ip addr' printing "Message truncated" on
systems with large numbers of VFs. Although I haven't been able to get
my hands on hardware suitable to reproduce this, increasing the dump
buffer has been reported to resolve the issue. For want of a better
idea, just double the buffer size to 32k.
Feels like this opportunistic buffer size selection is rather
workarounding a design flaw in libnetlink or maybe even the netlink
protocol itself.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This change is a no-op, as currently no code uses rtnl_talk on
NETLINK_SOCK_DIAG_BY_FAMILY sockets. It is needed to suppress
spurious errors when using SOCK_DESTROY via rtnl_talk.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
There is two variables named 'len' in rtnl_talk. In fact, commit
c079e121a7 didn't work. For example, it was possible to trigger
a seg fault with this command:
$ ip link set gre2 type ip6gre hoplimit 32
Let's rename the argument len to maxlen.
Fixes: c079e121a7 ("libnetlink: add size argument to rtnl_talk")
Reported-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
This patch introduces two new api's rta_nest and rta_nest_end to
nest attributes inside a rta attribute represented by 'struct rtattr'
as required to construct a nexthop. Also adds rta_addattr* variants
for u8, u16 and u64 as needed to support encapsulation.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Add support for filtering neighbor dumps by master device. Kernel side
support provided by commit 21fdd092acc7. Since the feature is not
available in older kernels the user is given a warning message if the
kernel does not support the request.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
There have been several instances where response from kernel
has overrun the stack buffer from the caller. Avoid future problems
by passing a size argument.
Also drop the unused peer and group arguments to rtnl_talk.
With this patch, it's now possible to listen in all netns that have an nsid
assigned into the netns where the socket is opened.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Sometimes, it is more convenient to get only one specific nested attribute by
type. For example for IFLA_AF_SPEC where type is address family (AF_INET6).
So add this helper for this purpose.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Replaced handling netlink messages by rtnl_dump_filter
from lib/libnetlink.c, also:
- removed unused dump_fp arg;
- added MAGIC_SEQ #define for 123456 seq id;
- silently exit if ENOENT errno is caused for NETLINK_SOCK_DIAG proto
in lib/libnetlink.c: rtnl_duml_filter_l(...) function. This fix
was added in a3fd8e58c1 by Eric
for misc/ss.c
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Starting from linux-3.15 (commit 9063e21fb026, "netlink: autosize skb
lengths"), kernel is able to send up to 16K in netlink replies.
This change enables iproute2 commands to get bigger chunks,
without breaking compatibility with old kernels.
Signed-off-by: Eric Dumazet <edumazet@google.com>
This change corrects a kernel incompatibility that was resulting in the
ext_filter_mask not being correctly discovered by the kernel as it is buried
somewhere in the ifinfomsg.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Recent kernel patches added support for VLAN filtering on the bridge.
This functionality allows one to turn a basic bridge into a VLAN bridge,
where VLANs dicatate packet forwarding and header transformation.
To configure the VLANs on the bridge and its ports a new command is
added to the 'bridge' utility.
# bridge vlan add dev eth0 vid 10 pvid untagged brdev
# bridge vlan add
# bridge vlan delete dev eth0 vid 10
# bridge vlan show
This command supports the following flags:
master - peform the operation on the software bridge device. This is
the default behavior.
self - perform the operation on the hardware associated with the port.
This flag is required when the device is the bridge device and
the configuration is desired on the bridge device itself (not
one of the ports).
pvid - Set the PVID (port vlan id) for a given port. Any untagged
frames arriving on the port will be assigned to this vlan.
untagged - Sets the egress policy of for a given vlan. Default port
egress policy is tagged. Set this flag if you wish traffic
associated with this VLAN to exit the port untagged.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Platforms have different alignment requirements which need to be
fulfilled by the compiler. If the structure elements are already
4 byte (NLMGS_ALIGNTO) aligned by the compiler adding an explicit
padding element (align_rta) is not allowed.
Use __attribute__ ((aligned (NLMSG_ALIGNTO))) in order to achieve
the required alignment.
Experienced on ARM (xscale) with symptom
netlink: 12 bytes leftover after parsing attributes
Tested on:
ARM (32bit Big Endian)
PowerPC (32bit Big Endian)
x86_64 (64bit Little Endian)
Each with different aligment requirments.
Signed-off-by: Lutz Jaenicke <ljaenicke@innominate.com>
Callers of rtnl_talk check errno value for their needs. In particular, the addrs
and routes restoring code validly reports success if the EEXISTS is in there.
However, the errno value can be sometimes screwed up by the perror call. Thus
we should only set it _after_ the message was emitted.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add a new netlink attribute type to the dump request to allow
filtering of the information returned for the respective matching
interfaces. At this time the only filter defined is to request
virtual function (VF) device info for interfaces that attached VFs.
It will also be possible to extend the request with other yet to be
defined netlink attributes in the future.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
This is trivial patch for libnetlink.c in iproute2.
In iproute2/include/linux/netlink.h NLM_F_DUMP is defines as:
#define NLM_F_DUMP (NLM_F_ROOT|NLM_F_MATCH)
It is not used in libnetlink.c. If used, the code becomes a bit easier
to read.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Both rtnl_talk and rtnl_dump had a callback for handling portions
of netlink message that do not match the correct pid or seq.
But this callback was never used by any part of iproute2 so remove
it.
The old 'ip addr flush' logic had several flaws:
* It reversed logic for primary v/s secondary flags
(though, it sort of worked right anyway)
* The code tried to remove secondaries and then primaries,
but in practice, it always removed one primary per loop,
which not at all efficient.
* The filter logic in the core would run only the first
filter in most cases.
* If you used '-s -s', the ifa_flags member would be
modified, which could make future filters fail
to function fine.
This patch attempts to fix all of these issues.
Tested-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Ben Greear <greearb@candelatech.com>
Modify the parser to keep track of the first of any duplicated attributes,
instead of the last. This is required for VF configuration reporting, where
multiple attributes of the same type are added sequentially.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Unless promote_secondaries has been active deleting the primary address of
an interface will automatically delete all the secondary addresses.
In the case where ip flush requests the primary then secondary addresses to
be removed - which is the order the addresses are returned by the kernel -
this will cause an error as by the time the request to remove a secondary
address is made it will be missing as it will have been deleted in the
course of deleting the primary address.
This approach to solving this problem orders requests for the
deletion of secondary addresses before primary ones providing
rtnl_dump_filter_l(), a version of rtnl_dump_filter() that
iterates over a list of filters. And by providing two specialised
filters print_addrinfo_secondary() and print_addrinfo_primary().
rtnl_dump_filter_l() first iterates over all addresses using
print_addrinfo_secondary(), which appends secondary addresses to the
request buffer. Then again using print_addrinfo_primary() which appends
primary addresses.
This approach should work regardless of it promote_secondaries is
active or not. And regardless of if any primary of secondary addresses
are present or not.
Signed-off-by: Simon Horman <horms@verge.net.au>
It uses 1MB as receive buf limit by default (without
increasing /proc/sys/net/core/rmem_max it will be limited by less
however) and allows to specify the size manually using "-rcvbuf X"
(-r is already used, so you need to specify at least -rc).
Additionally rtnl_listen() continues on ENOBUFS after printing the
error message.
I experienced an error, if I try to perform a
ip route flush proto 4
with many routes in a complex environment, it
gave me the following error:
Failed to send flush request: Success
Flush terminated
Some usages of rtnl_send could cause errors (ie flush requests)
others do a listen afterwards.
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
This fixes the problem where a bulk operation (like ip flush)
is performed as non-root user. The kernel will only send a response
if there is an error, so check for it.
If there is a problem talking to kernel, don't retry except in the
special case of signal or -EAGAIN
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
This adds capability for iproute2 to send nested attributes to the
kernel, while maintaining backwards compatibility.
Signed-off-by: Patrick McHardy <kaber@trash.net>
(050929) version of 'ip addr' hangs while older versions worked.
The problem was traced to be a removed initialisation. The patch
below corrects this problem.