Commit Graph

4413 Commits

Author SHA1 Message Date
Or Gerlitz
761ec9e29f tc/flower: Add match on encapsulating tos/ttl
Add matching on tos/ttl of the IP tunnel headers.

For example, here's decap rule that matches on the tunnel tos:

tc filter add dev vxlan_sys_4789 protocol ip parent ffff: prio 10 flower \
   enc_src_ip 192.168.10.2 enc_dst_ip 192.168.10.1 enc_key_id 100 enc_dst_port 4789 enc_tos 0x30 \
   src_mac e4:11:22:33:44:70 dst_mac e4:11:22:33:44:50  \
   action tunnel_key unset \
   action mirred egress redirect dev eth0_0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:59:11 -07:00
Or Gerlitz
9f89b0cc0e tc/act_tunnel_key: Enable setup of tos and ttl
Allow to set tos and ttl for the tunnel.

For example, here's encap rule that sets tos to the tunnel:

tc filter add dev eth0_0 protocol ip parent ffff: prio 10 flower \
   src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:70 \
   action tunnel_key set src_ip 192.168.10.1 dst_ip 192.168.10.2 id 100 dst_port 4789 tos 0x30 \
   action mirred egress redirect dev vxlan_sys_4789

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:58:31 -07:00
David Ahern
204db84eb8 Update kernel headers
Update kernel headers to
a3eed83a1895 ("Merge branch 'qed-Add-support-for-phy-module-query'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:57:23 -07:00
Toke Høiland-Jørgensen
77c9fbd06e q_cake: Rename autorate_ingress parameter to use dash as word separator
This is consistent with the other multi-word parameters. Also change the
JSON output to be consistent with way it is formatted for the other
options.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:46:42 -07:00
Jesus Sanchez-Palencia
b625e36108 tc: Do not use addattr_nest_compat on mqprio and netem
Here we are partially reverting commit c14f9d92ee
"treewide: Use addattr_nest()/addattr_nest_end() to handle nested
attributes" .

As discussed in [1], changing from the 'manually' coded version that
used addattr_l() to addattr_nest_compat() wasn't functionally
equivalent, because now the messages have extra fields appended to it.

This introduced a regression since the implementation of parse_attr()
from both mqprio and netem can't handle this new message format.

Without this fix, mqprio returns an error. netem won't return an error
but its internal configuration ends up wrong.

As an example, this can be reproduced by the following commands when
this patch is not applied:

 1) mqprio
$ tc qdisc replace dev enp3s0 parent root handle 100 mqprio \
	num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	queues 1@0 1@1 2@2 hw 0

RTNETLINK answers: Numerical result out of range

 2) netem
$ tc qdisc add dev enp3s0 root netem rate 5kbit 20 100 5 \
	distribution normal latency 1 1

$ tc -s qdisc

(...)
qdisc netem 8001: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us
 Sent 402 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

With this patch applied, the tc -s qdisc command above for netem instead
reads:

(...)
qdisc netem 8002: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us \
	rate 5Kbit packetoverhead 20 cellsize 100 celloverhead 5
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

[1] https://patchwork.ozlabs.org/patch/867860/#1893405

Fixes: c14f9d92ee ("treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes")
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-19 15:50:07 -07:00
Toke Høiland-Jørgensen
714444c0cb Add support for CAKE qdisc
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort

Cake is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Tony Ambardar, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:23:46 -07:00
Alex Vesker
8b4fbf0bed devlink: Add support for devlink-region access
Devlink region allows access to driver defined address regions.
Each device can create its supported address regions and register
them. A device which exposes a region will allow access to it
using devlink.

This support allows reading and dumping regions snapshots as well
as presenting information such as region size and current available
snapshots.

A snapshot represents a memory image of a region taken by the driver.
If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots.

The dump command is designed to read the full address space of a
region or of a snapshot unlike the read command which allows
reading only a specific section in a region/snapshot indicated by
an address and a length, current support is for reading and dumping
for a previously taken snapshot ID.

New commands added:
 devlink region show [ DEV/REGION ]
 devlink region delete DEV/REGION snapshot SNAPSHOT_ID
 devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
 devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
                                address ADDRESS length length

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:20:15 -07:00
Qiaobin Fu
697dce7b3a net:sched: add action inheritdsfield to skbedit
The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v4:
* Make tc use netlink helper functions

v3:
* Make flag represented in JSON output as a null value

v2:
* Align the output syntax with the input syntax

* Fix the style issues

Original idea by Jamal Hadi Salim <jhs@mojatatu.com>

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:17:56 -07:00
Mathieu Xhonneux
04cb3c0d43 ip: add support for seg6local End.BPF action
This patch adds support for the End.BPF action of the seg6local
lightweight tunnel. Functions from the BPF lightweight tunnel are
re-used in this patch. Example:

$ ip -6 route add fc00::18 encap seg6local action End.BPF endpoint
obj my_bpf.o sec my_func dev eth0

$ ip -6 route show fc00::18
fc00::18  encap seg6local action End.BPF endpoint my_bpf.o:[my_func]
dev eth0 metric 1024 pref medium

v2: - re-use of print_encap_bpf_prog instead of fprintf
    - introduction of "endpoint" keyword for more consistency with
      others parameters

Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-18 15:56:18 -07:00
Serhey Popovych
8df708afd6 ipaddress: Fix and make consistent label match handling
Since commit 9516823051 ("ipaddress: Improve print_linkinfo()") we
return -1 instead of 0 when ip-address(8) label does not match network
device name as we did before change. This causes regression when trying
to output ip address matching label:

     # ip addr add 192.168.192.1/24 dev lo label lo:1
     # ip addr show label lo:1
     <no output>

This is special case and return 0 from print_linkinfo() earlier to match
only filter.ifindex and filter.up if given, but not rest fields in
@filter. Then call print_selected_addrinfo() without calling
print_link_stats() in ipaddr_list_flush_or_save().

Later print_selected_addrinfo() calls print_addrinfo() that finally
matches IFA_LABEL attribute in netlink buffer with filter.label using
ifa_label_match_rta().

On the other hand there is three conditions checked in print_linkinfo()
to determine label special case:

    1) filter.label != NULL
    2) filter.family == AF_UNSPEC || filter.family == AF_PACKET
    3) fnmatch(filter.label, name, 0)

With 1) it is ok to check if filtering by label is on by given pattern
in @filter.label.

Since label is IPv4 specific and AF_PACKET is for printing ip-link(8)
information (see ipaddr_link_list()::ipaddress.c as example) checking
for AF_PACKET in 2) doesn't take much sense: better to defer these
checks to print_addrinfo() determine valid combinations before calling
ifa_label_match_rta() to finally match IFA_LABEL to pattern in
filter.label.

For 3) we have following call for test case:

    fnmatch(pattern, string, flags) ->
      fnmatch(filter.label, name, 0) ->
        fnmatch("lo:1", "lo", 0) == FNM_NOMATCH (1) or non-zero on error

To support special case in print_linkinfo() for filtering by label we
only need to check if label pattern is given in filter.label and return
0 to skip print_link_stats() in ipaddr_list_flush_or_save(): actual
filtering will be done in print_addrinfo().

Before commit 9516823051 ("ipaddress: Improve print_linkinfo()"):
-------------------------------------------------------------------

$ ip addr sh label lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN \
group default qlen 1000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                          fnmatch("lo", "lo", 0) == 0
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
$ ip addr show label 'lo:*'
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip addr sh label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip -4 addr sh label lo:1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN \
group default qlen 1000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                             filter.family == AF_INET
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever

After this change applied:
--------------------------

$ ip/ip addr show label lo
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
$ ip/ip addr show label 'lo:*'
    inet 192.168.192.1/24 scope global lo:1
        valid_lft forever preferred_lft forever
$ ip/ip addr show label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip/ip -4 addr show label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever

Note that we no longer show link information as we did previously:
    we are filtering by "label" pattern, not showing by "dev".

Fixes: commit 9516823051 ("ipaddress: Improve print_linkinfo()")
Reported-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-18 15:52:55 -07:00
David Ahern
b05a68f721 Merge branch 'bpf-btf' into iproute2-next
Daniel Borkmann  says:

====================

Main part of this set is to: i) avoid strict af_alg kernel dependency,
ii) add loader support for bpf to bpf calls and iii) add btf loader
support with an option to annotate maps. For details please see the
individual patches. Thanks!

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:39:06 -07:00
Daniel Borkmann
f823f36012 bpf: implement btf handling and map annotation
Implement loading of .BTF section from object file and build up
internal table for retrieving key/value id related to maps in
the BPF program. Latter is done by setting up struct btf_type
table.

One of the issues is that there's a disconnect between the data
types used in the map and struct bpf_elf_map, meaning the underlying
types are unknown from the map description. One way to overcome
this is to add a annotation such that the loader will recognize
the relation to both. BPF_ANNOTATE_KV_PAIR(map_foo, struct key,
struct val); has been added to the API that programs can use.

The loader will then pick the corresponding key/value type ids and
attach it to the maps for creation. This can later on be dumped via
bpftool for introspection.

Example with test_xdp_noinline.o from kernel selftests:

  [...]

  struct ctl_value {
        union {
                __u64 value;
                __u32 ifindex;
                __u8 mac[6];
        };
  };

  struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
        .type		= BPF_MAP_TYPE_ARRAY,
        .key_size	= sizeof(__u32),
        .value_size	= sizeof(struct ctl_value),
        .max_entries	= 16,
        .map_flags	= 0,
  };
  BPF_ANNOTATE_KV_PAIR(ctl_array, __u32, struct ctl_value);

  [...]

Above could also further be wrapped in a macro. Compiling through LLVM and
converting to BTF:

  # llc --version
  LLVM (http://llvm.org/):
    LLVM version 7.0.0svn
    Optimized build.
    Default target: x86_64-unknown-linux-gnu
    Host CPU: skylake

    Registered Targets:
      bpf    - BPF (host endian)
      bpfeb  - BPF (big endian)
      bpfel  - BPF (little endian)
  [...]

  # clang [...] -O2 -target bpf -g -emit-llvm -c test_xdp_noinline.c -o - |
    llc -march=bpf -mcpu=probe -mattr=dwarfris -filetype=obj -o test_xdp_noinline.o
  # pahole -J test_xdp_noinline.o

Checking pahole dump of BPF object file:

  # file test_xdp_noinline.o
  test_xdp_noinline.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), with debug_info, not stripped
  # pahole test_xdp_noinline.o
  [...]
  struct ctl_value {
	union {
		__u64              value;                /*     0     8 */
		__u32              ifindex;              /*     0     4 */
		__u8               mac[0];               /*     0     0 */
	};                                               /*     0     8 */

	/* size: 8, cachelines: 1, members: 1 */
	/* last cacheline: 8 bytes */
  };

Now loading into kernel and dumping the map via bpftool:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:227 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
         valid_lft forever preferred_lft forever
  [...]
  # bpftool prog show id 227
  227: xdp  tag a85e060c275c5616  gpl
      loaded_at 2018-07-17T14:41:29+0000  uid 0
      xlated 8152B  not jited  memlock 12288B  map_ids 381,385,386,382,384,383
  # bpftool map dump id 386
   [{
        "key": 0,
        "value": {
            "": {
                "value": 0,
                "ifindex": 0,
                "mac": []
            }
        }
    },{
        "key": 1,
        "value": {
            "": {
                "value": 0,
                "ifindex": 0,
                "mac": []
            }
        }
    },{
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:44 -07:00
Daniel Borkmann
b5cb33aec6 bpf: implement bpf to bpf calls support
Implement missing bpf to bpf calls support. The loader will
recognize .text section and handle relocation entries that
are emitted by LLVM.

First step is processing of map related relocation entries
for .text section, and in a second step loader will copy .text
section into program section and adjust call instruction
offset accordingly.

Example with test_xdp_noinline.o from kernel selftests:

 1) Every function as __attribute__ ((always_inline)), rest
    left unchanged:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:233 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
  [...]
  # bpftool prog dump xlated id 233
  [...]
  1669: (2d) if r3 > r2 goto pc+4
  1670: (79) r2 = *(u64 *)(r10 -136)
  1671: (61) r2 = *(u32 *)(r2 +0)
  1672: (63) *(u32 *)(r1 +0) = r2
  1673: (b7) r0 = 1
  1674: (95) exit        <-- 1674 insns total

 2) Every function as __attribute__ ((noinline)), rest
    left unchanged:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:236 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
  [...]
  # bpftool prog dump xlated id 236
  [...]
  1000: (bf) r1 = r6
  1001: (b7) r2 = 24
  1002: (85) call pc+3   <-- pc-relative call insns
  1003: (1f) r7 -= r0
  1004: (bf) r0 = r7
  1005: (95) exit
  1006: (bf) r0 = r1
  1007: (bf) r1 = r2
  1008: (67) r1 <<= 32
  1009: (77) r1 >>= 32
  1010: (bf) r3 = r0
  1011: (6f) r3 <<= r1
  1012: (87) r2 = -r2
  1013: (57) r2 &= 31
  1014: (67) r0 <<= 32
  1015: (77) r0 >>= 32
  1016: (7f) r0 >>= r2
  1017: (4f) r0 |= r3
  1018: (95) exit        <-- 1018 insns total

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:43 -07:00
Daniel Borkmann
6e5094dbb7 bpf: remove strict dependency on af_alg
Do not bail out when AF_ALG is not supported by the kernel and
only do so when a map is requested in object ns where we're
calculating the hash. Otherwise, the loader can operate just
fine, therefore lets not fail early when it's not needed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:40 -07:00
Daniel Borkmann
282a1fe1f8 bpf: move bpf_elf_map fixup notification under verbose
No need to spam the user with this if it can be fixed gracefully
anyway. Therefore, move it under verbose option.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:38 -07:00
David Ahern
5081979176 Import btf.h from kernel headers
Import btf.h from kernel headers at commit
    2aa4a3378ad0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
which is the last sync point.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:37:50 -07:00
Roopa Prabhu
9c6a6d84ee ipneigh: exclude NTF_EXT_LEARNED from default filter
NUD_NOARP entries are filtered out by default by iproute2.
We dont want NUD_NOARP with NTF_EXT_LEARNED flag filtered out.
This patch extends the default filter check for ip neigh show
to include the NTF_EXT_LEARNED flag.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 18:57:21 -07:00
Jakub Kicinski
da083b5a48 iplink: add support for reporting multiple XDP programs
Kernel now supports attaching XDP programs in the driver
and hardware at the same time.  Print that information
correctly.

In case there are multiple programs attached kernel will
not provide IFLA_XDP_PROG_ID, so don't expect it to be
there (this also improves the printing for very old kernels
slightly, as it avoids unnecessary "prog/xdp" line).

In short mode preserve the current outputs but don't print
IDs if there are multiple.

6: netdevsim0: <BROADCAST,NOARP> mtu 1500 xdpoffload/id:11 qdisc [...]

and:

6: netdevsim0: <BROADCAST,NOARP> mtu 1500 xdpmulti qdisc [...]

ip link output will keep using prog/xdp prefix if only one program
is attached, but can also print multiple program lines:

    prog/xdp id 8 tag fc7a51d1a693a99e jited

vs:

    prog/xdpdrv id 8 tag fc7a51d1a693a99e jited
    prog/xdpoffload id 9 tag fc7a51d1a693a99e

JSON output gains a new array called "attached" which will
contain the full list of attached programs along with their
attachment modes:

        "xdp": {
            "mode": 3,
            "prog": {
                "id": 11,
                "tag": "fc7a51d1a693a99e",
                "jited": 0
            },
            "attached": [ {
                    "mode": 3,
                    "prog": {
                        "id": 11,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 0
                    }
                } ]
        },

In case there are multiple programs attached the general "xdp"
section will not contain program information:

        "xdp": {
            "mode": 4,
            "attached": [ {
                    "mode": 1,
                    "prog": {
                        "id": 10,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 1
                    }
                },{
                    "mode": 3,
                    "prog": {
                        "id": 11,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 0
                    }
                } ]
        },

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:10:03 -07:00
Jianbo Liu
1f0a5dfd38 tc: flower: Add support for QinQ
To support matching on both outer and inner vlan headers,
we add new cvlan_id/cvlan_prio/cvlan_ethtype for inner vlan header.

Example:
# tc filter add dev eth0 protocol 802.1ad parent ffff: \
    flower vlan_id 1000 vlan_ethtype 802.1q \
        cvlan_id 100 cvlan_ethtype ipv4 \
    action vlan pop \
    action vlan pop \
    action mirred egress redirect dev eth1

# tc filter show dev eth0 ingress
filter protocol 802.1ad pref 1 flower chain 0
filter protocol 802.1ad pref 1 flower chain 0 handle 0x1
  vlan_id 1000
  vlan_ethtype 802.1Q
  cvlan_id 100
  cvlan_ethtype ip
  eth_type ipv4
  in_hw

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:03:50 -07:00
David Ahern
3eebc1d4f4 Update kernel headers
Update kernel headers to commit
2aa4a3378ad0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:02:51 -07:00
David Ahern
5910422d21 Merge branch 'tc-etf' into iproute2-next
Jesus Sanchez-Palencia  says:

====================

fixes since v3:
 - Add support for clock names with the "CLOCK_" prefix;
 - Print clock name on print_opt();
 - Use strcasecmp() instead of strncasecmp().

The ETF (earliest txtime first) qdisc was recently merged into net-next
[1], so this patchset adds support for it through the tc command line
tool.

An initial man page is also provided.

The first commit in this series is adding an updated version of
include/uapi/linux/pkt_sched.h and is not meant to be merged. It's
provided here just as a convenience for those who want to easily build
this patchset.

[1] https://patchwork.ozlabs.org/cover/938991/

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:51:46 -07:00
Jesus Sanchez-Palencia
85d699c3a8 man: Add initial manpage for tc-etf(8)
Add an initial manpage for tc-etf covering all config options, basic
concepts and operation modes.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:50:53 -07:00
Vinicius Costa Gomes
7da5ef2200 tc: Add support for the ETF Qdisc
The "Earliest TxTime First" (ETF) queueing discipline allows precise
control of the transmission time of packets by providing a sorted
time-based scheduling of packets.

The syntax is:

tc qdisc add dev DEV parent NODE etf delta <DELTA>
                     clockid <CLOCKID> [offload] [deadline_mode]

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:50:10 -07:00
Stephen Hemminger
b49759c0e7 tc: don't double print rate
Conversion to print stats in JSON forgot to remove existing
fprintf.

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-09 09:53:45 -07:00
Jesus Sanchez-Palencia
4df5bb1be0 man: Fix typos on tc-cbs
Fix 2 typos on the man page of the CBS qdisc.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
fumihiko kakuma
d529ea2ff4 tc: Fix the bug not to display prio and quantum options of htb
A commandline like 'tc -d class show dev dev-name' does not
display value of prio and quantum option when we use htb qdisc.
This patch fixes the bug.

Signed-off-by: Fumihiko Kakuma <kakuma@valinux.co.jp>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
Roi Dayan
425dcc2741 tc: Fix output of ip attributes
Example output is of tos and ttl.
Befoe this fix the format used %x caused output of the pointer
instead of the intended string created in the out variable.

Fixes: e28b88a464 ("tc: jsonify flower filter")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
Stephen Hemminger
dc3ef235f3 uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:56:27 -07:00
Simon Horman
6217917a38 tc: m_tunnel_key: Add tunnel option support to act_tunnel_key
Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 09:10:05 -07:00
Moshe Shemesh
13925ae9eb devlink: Add param command support
Add support for configuration parameters set and show.
Each parameter can be either generic or driver-specific.
The user can retrieve data on these configuration parameters by devlink
param show command and can set new value to a configuration parameter
by devlink param set command.
The configuration parameters can be set in different configuration
modes:
  runtime - set while driver is running, no reset required.
  driverinit - applied while driver initializes, requires restart
               driver by devlink reload command.
  permanent - written to device's non-volatile memory, hard reset
              required to apply.

New commands added:
  devlink dev param show [DEV name PARAMETER]
  devlink dev param set DEV name PARAMETER value VALUE
			    cmode { permanent | driverinit | runtime }

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 08:43:28 -07:00
David Ahern
22ddbd8204 Update kernel headers
Update kernel headers to commit
ab8565af68001 ("Merge branch 'IP-listification-follow-ups'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 08:42:22 -07:00
Nikolay Aleksandrov
05001bcfab bridge: add support for isolated option
This patch adds support for the new isolated port option which, if set,
would allow the isolated ports to communicate only with non-isolated
ports and the bridge device. The option can be set via the bridge or ip
link type bridge_slave commands, e.g.:
$ ip link set dev eth0 type bridge_slave isolated on
$ bridge link set dev eth0 isolated on

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 07:58:41 -07:00
David Ahern
f2bfb31bef Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-21 08:12:39 -07:00
Keara Leibovitz
4757a54799 tc: jsonify nat action
Add json output support for nat action

Example output:

~$ $TC actions add action nat egress 10.10.10.1 20.20.20.2 index 2
~$ $TC actions add action nat ingress 100.100.100.1/32 200.200.200.2 \
	continue index 99
~$ $TC -j actions ls action nat

[{
	"total acts": 2
}, {
	"actions": [{
		"order": 0,
		"type": "nat",
		"direction": "egress",
		"old_addr": "10.10.10.1/32",
		"new_addr": "20.20.20.2",
		"control_action": {
			"type": "pass"
		},
		"index": 2,
		"ref": 1,
		"bind": 0
	}, {
		"order": 1,
		"type": "nat",
		"direction": "ingress",
		"old_addr": "100.100.100.1/32",
		"new_addr": "200.200.200.2",
		"control_action": {
			"type": "continue"
		},
		"index": 99,
		"ref": 1,
		"bind": 0
	}]
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-20 10:20:34 -07:00
Eric S. Raymond
a85f921ae5 devlink.8, translate unparseable callout syntax to parseable form.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-20 09:41:41 -07:00
Vlad Buslov
b133392468 tc: fix batch force option
When sending accumulated compound command results an error, check 'force'
option before exiting. Move return code check after putting batch bufs and
freeing iovs to prevent memory leak. Break from loop, instead of returning
error code to allow cleanup at the end of batch function. Don't reset ret
code on each iteration.

Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-20 09:32:36 -07:00
Subash Abhinov Kasiviswanathan
2ecb61a0c2 ip-xfrm: Add support for OUTPUT_MARK
This patch adds support for OUTPUT_MARK in xfrm state to exercise the
functionality added by kernel commit 077fbac405bf
("net: xfrm: support setting an output mark.").

Sample output-

(with mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with output-mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(no mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

v1->v2: Moved the XFRMA_OUTPUT_MARK print after XFRMA_MARK in
xfrm_xfrma_print() as mentioned by Lorenzo

v2->v3: Fix one help formatting error as mentioned by Lorenzo.
Keep mark and output-mark on the same line and add man page info as
mentioned by David.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-18 06:37:00 -07:00
Daniele Palmas
46c16a5d1e ip: add rmnet initial support
This patch adds basic support for Qualcomm rmnet devices.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:15:14 -07:00
Patrick Talbert
cad73425d8 ipaddress: strengthen check on 'label' input
As mentioned in the ip-address man page, an address label must
be equal to the device name or prefixed by the device name
followed by a colon. Currently the only check on this input is
to see if the device name appears at the beginning of the label
string.

This commit adds an additional check to ensure label == dev or
continues with a colon.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:14:19 -07:00
Hoang Le
5887ff0922 rdma: sync some IP headers with glibc
In the commit 9a362cc71a, new userspace header:
  (i.e rdma/rdma_user_cm.h -> linux/in6.h)
is included before the kernel space header:
  (i.e utils.h -> resolv.h -> netinet/in.h).

This leads to unsynchronous some IP headers and compiler got failure
with error: redefinition of some structs IP.

In this commit, just reorder this including to make them in-sync.

Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:11:51 -07:00
Hoang Le
a56e0db7e8 tipc: JSON support for tipc link printouts
Add json output support for tipc link command

Example output:
$tipc -j -p link list

[ {
        "broadcast-link": "up",
        "1.1.1:bridge-1.1.104:eth0": "up",
        "1.1.1:bridge-1.1.105:eth0": "up",
        "1.1.1:bridge-1.1.106:eth0": "up"
    } ]

--------------------
$tipc -j -p link stat show link broadcast-link

[ {
        "link": "broadcast-link",
        "window": 50,
        "rx packets": {
            "rx packets": 0,
            "fragments": 0,
            "fragmented": 0,
            "bundles": 0,
            "bundled": 0
        },
        "tx packets": {
            "tx packets": 0,
            "fragments": 0,
            "fragmented": 0,
            "bundles": 0,
            "bundled": 0
        },
        "rx naks": {
            "rx naks": 0,
            "defs": 0,
            "dups": 0
        },
        "tx naks": {
            "tx naks": 0,
            "acks": 0,
            "retrans": 0
        },
        "congestion link": 0,
        "send queue max": 0,
        "avg": 0
    } ]

v2:
    Replace variable 'json_flag' by 'json' declared in include/utils.h

v3:
    Update manual page

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-13 20:45:59 -07:00
Hoang Le
1304f50a5b tipc: JSON support for showing nametable
Add json output support for nametable show

Example output:
$tipc -j -p nametable show

[ {
        "type": 0,
        "lower": 16781313,
        "upper": 16781313,
        "scope": "zone",
        "port": 0,
        "node": ""
    },{
        "type": 0,
        "lower": 16781416,
        "upper": 16781416,
        "scope": "cluster",
        "port": 0,
        "node": ""
    } ]

v2:
    Replace variable 'json_flag' by 'json' declared in include/utils.h
    Add new parameter '-pretty' to support pretty output

v3:
    Update manual page

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-13 20:45:38 -07:00
Donald Sharp
a313455c6c iproute2: Add support for a few routing protocols
Add support for:

BGP
ISIS
OSPF
RIP
EIGRP

Routing protocols to iproute2.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-11 11:18:30 -07:00
David Ahern
ee095a417e Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-10 07:30:32 -07:00
Stephen Hemminger
776f1813b5 uapi: update headers from linux-net
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:27:13 -07:00
Stephen Hemminger
17678d3059 Merge ../iproute2-next 2018-06-08 10:27:04 -07:00
Stephen Hemminger
2d3dd6f6c1 v4.17.0 2018-06-08 10:11:50 -07:00
Stephen Hemminger
4be85d574e uapi: update bpf.h to include padding
Last minute upstream 4.17 change.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:09:21 -07:00
Hoang Le
313ce6949c tipc: TIPC_NLA_LINK_NAME value pass on nesting entry TIPC_NLA_LINK
In the commit 94f6a80 on next-net, TIPC_NLA_LINK_NAME attribute should be
retrieved and validated via TIPC_NLA_LINK nesting entry in
tipc_nl_node_get_link().
According to that commit, TIPC_NLA_LINK_NAME value passing via
tipc link get command must follow above hierachy.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:07:13 -07:00
Nicolas Dichtel
974ef93bf1 iplink: enable to specify a name for the link-netns
The 'link-netnsid' argument needs a number. Add 'link-netns' when the user
wants to use the iproute2 netns name instead of the nsid.

Example:
ip link add ipip1 link-netns foo type ipip remote 10.16.0.121 local 10.16.0.249

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:06:21 -07:00