Commit Graph

2409 Commits

Author SHA1 Message Date
Daniel Borkmann
32e93fb7f6 {f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.

Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.

For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.

This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.

The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:

 - classifier-classifier shared:

  tc filter add dev foo parent 1: bpf obj shared.o sec egress
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress

 - classifier-action shared (here: late binding to a dummy classifier):

  tc actions add action bpf obj shared.o sec egress pass index 42
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
  tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
     action bpf index 42

The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):

  [...]
          <idle>-0     [002] ..s. 38264.788234: : map val: 4
          <idle>-0     [002] ..s. 38264.788919: : map val: 4
          <idle>-0     [002] ..s. 38264.789599: : map val: 5
  [...]

... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.

The patch has been tested extensively on both, classifier and
action sides.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-23 16:10:44 -08:00
Neil Horman
e149d4e843 iproute2: Ignore EADDRNOTAVAIL errors during address flush operation
I found recently that, if I disabled address promotion in the kernel, that
ip addr flush dev <dev>

would fail with an EADDRNOTAVAIL errno (though the flush operation would in fact
flush all addresses from an interface properly)

Whats happening is that, if I add a primary and multiple secondary addresses to
an interface, the flush operation first ennumerates them all with a GETADDR |
DUMP operation, then sends a delete request for each address.  But the kernel,
having promotion disabled, deletes all secondary addresses when the primary is
removed.  That means, that several delete requests may still be pending in the
netlink request for addresses that have been removed on our behalf, resulting in
EADDRNOTAVAIL return codes.

It seems the simplest thing to do is to understand that EADDRUNAVAIL isn't a
fatal outcome on a flush operation, as it just indicates that an address which
you want to remove is already removed, so it can safely be ignored.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
2015-11-23 15:59:08 -08:00
Phil Sutter
6e2e2cf03a bridge.8: document fdb replace command
Despite commit 45a82e5 ("iproute vxlan add support for fdb replace
command"), the 'fdb replace' command was not mentioned in bridge.8.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:58:07 -08:00
Phil Sutter
fdb347f7fd lnstat: fix header displaying mechanism
The algorithm depends on the loop counter ('i') to increment by one in
each iteration. Though if running endlessly (count==0), the counter was
not incremented at all.

Also change formatting of the header printing conditional a bit so it's
hopefully easier to read.

Fixes: e7e2913 ("lnstat: run indefinitely by default")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:54:05 -08:00
Phil Sutter
869fcabecc lnstat: describe -s option in help output
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:54:05 -08:00
Stephen Hemminger
0198930b55 update kernel headers to 4.4-rc1
Post merge window changes
2015-11-23 15:53:04 -08:00
Phil Sutter
f7b49a3fc7 ip_common.h header cleanup
- Drop 'extern' keyword from all function prototypes.
- Make line breaking of print_* functions consistent.
- Make print_ntable() and ipntable_reset_filter() static and remove
  their declaration.
- Drop declaration of non-existent ipaddr_list() and iproute_monitor().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:44:03 -08:00
Stephen Hemminger
23d6c997d9 misc: remove extra blank line 2015-11-23 15:42:34 -08:00
Stephen Hemminger
5699275b42 man8: scrub trailing whitespace
Remove extraneous whitespace
2015-11-23 15:41:37 -08:00
Ville Skyttä
ac0817ef66 man: Spelling fixes
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
2015-11-23 15:39:25 -08:00
Ville Skyttä
85e3c87c82 man: Syntax and warning fixes
Fix syntax issues and warnings highlighted by `man --warnings=w' from
man-db 2.7.1.

Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
2015-11-23 15:39:25 -08:00
Phil Sutter
04ce8d3eda ip{,6}tunnel: put spaces around non-unary operators
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
f53ecee818 iptunnel: sanitize copying tunnel name
Since p->name is only IFNAMSIZ bytes, do not copy more than IFNAMSIZ - 1
bytes into it so there remains at least a single null byte in the end.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
c957821b18 iptunnel: share common code when determining the default interface name
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
0dd4d2b37f iptunnel: simplify parsing TTL, allow 'hlim' as identifier
Instead of parsing an unsigned integer and checking boundaries, simply
parse u8. This and the added ttl alias 'hlim' provide consistency with
ip6tunnel.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
2520598a1a iptunnel: share common code when setting tunnel mode
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
7894ce7722 ip6tunnel: fix coding style: no newline between brace and else
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
9af72f819e ip6tunnel: print local/remote addresses like iptunnel does
This makes output consistent with iptunnel, also supporting reverse DNS
lookup for remote address if requested.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
c4527d7ba3 ip{,6}tunnel: align do_tunnels_list() a bit
In iptunnel, declare loop variables inside the loop as done in
ip6tunnel.

Fix and simplify goto logic in ip6tunnel:
- Failure to read over header lines would have left fp opened.
- By returning directly upon fopen() failure, fp can be closed
  unconditionally in the end.

Use the same goto logic in iptunnel, as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
4b3cb96281 iptunnel: use ll_name_to_index() for physical interface lookup
Although the cache is only initialized in do_show(), this way it is at
least consistent with ip6tunnel.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
6ddb1e8c90 ip{, 6}tunnel: unify behaviour if physical device is not found
Make ip6tunnel print an error message as well. While there, get rid of
unnecessary line breaking.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
a7ed1520ee ip/tunnel: introduce tnl_parse_key()
Instead of duplicating the same code six times (key, ikey and okey in
iptunnel and ip6tunnel), have a common parsing routine. This has the
added benefit of having the same verbose error message in ip6tunnel as
well as iptunnel.

I'm not sure if parsing an IPv4 address as key makes sense for
ip6tunnel, but the code was there before so this patch at least doesn't
make it worse.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter
8de592d05c ip{, 6}tunnel: get rid of extraneous whitespace when printing
Put whitespace in the beginning of optional parts, not as suffix
anywhere. Also drop double whitespaces in between words.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Aaro Koskinen
caf8875b3c misc/Makefile: use PKG_CONFIG
Use PKG_CONFIG from Config - it works better when cross-compiling.

Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
2015-11-23 15:25:50 -08:00
Stephen Hemminger
115b4d8873 Merge branch 'master' into net-next 2015-11-03 16:38:15 -08:00
Stephen Hemminger
6720eceff7 v4.3.0 2015-11-03 16:34:46 -08:00
Stephen Hemminger
1e5aa99024 Merge branch 'master' into net-next 2015-11-03 16:31:57 -08:00
Phil Sutter
b5bb1820e8 lib/utils: improve error messages of get_addr() and get_prefix()
Instead of statically complaining about illegal inet address, use
get_family() to get the address family right.

Based on a patch by Hangbin Liu to print "inet6" for AF_INET6 made more
generic by me.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-03 16:28:36 -08:00
Phil Sutter
bd5bbad450 bridge: fdb: minor syntax fix in help text 2015-11-03 16:27:39 -08:00
Phil Sutter
17c53fcd2c ifstat: add manpage 2015-11-03 16:27:39 -08:00
Phil Sutter
7124942942 genl: add manpage 2015-11-03 16:27:39 -08:00
Phil Sutter
958cd21094 ifcfg: add manpage 2015-11-03 16:27:39 -08:00
Stephen Hemminger
dddf1b4412 add new IFLA_VF_TRUST netlink attribute 2015-10-23 15:47:07 -07:00
Stephen Hemminger
86c392f958 Merge branch 'master' into net-next 2015-10-23 15:46:08 -07:00
Stephen Hemminger
1473bda921 misc: cleanup extra whitespace
No blank lines at end of file
2015-10-23 15:44:30 -07:00
Stephen Hemminger
753ef5bbd6 tc: remove extra whitespace
No blank lines at EOF, or trailing whitespace.
2015-10-23 15:43:28 -07:00
Stephen Hemminger
f7520a1998 ip: remove extra newlines at end-of-file
Shouldn't have extra blank lines.
2015-10-23 15:41:58 -07:00
Phil Sutter
a257bc7b4c tc: ship filter man pages and refer to them in tc.8
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Werner Almesberger <werner@almesberger.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:39:28 -07:00
Phil Sutter
f15a23966f tc: add a man page for u32 filter
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:39:28 -07:00
Phil Sutter
fc7a72f1eb tc: add a man page for tcindex filter
Cc: Werner Almesberger <werner@almesberger.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
02dddd6110 tc: add a man page for route filter
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
49891ba177 tc: add a man page for fw filter
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
b3aa12a401 tc: add a man page for flower filter
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
334ddc9b4d tc: add a man page for flow filter
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
5774f09ee8 tc: add a man page for cgroup filter
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
55b35567ad tc: add a man page for basic filter
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
40eb737ebb tc: u32 filter coding style cleanup
Add missing spaces around operators to increase readability. Aside from
that, make "preference" match a real synonym for "tos" and "dsfield" as
it's effect was identical to them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
0a83e1eaf7 tc: improve filter help texts a bit
This fixes a few syntax errors and changes route filter help text to use
classid instead of flowid to be consistent with other filters' help
texts.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Stephen Hemminger
c518d3a7f7 update bpf kernel header 2015-10-22 23:43:35 -07:00
Stephen Hemminger
651dccbee7 Merge branch 'master' into net-next 2015-10-22 23:42:37 -07:00