Commit Graph

3594 Commits

Author SHA1 Message Date
Amir Vadai
f3e1b2448a pedit: Introduce ipv6 support
Add support for modifying IPv6 headers using pedit.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai
a13426fe1a pedit: Check for extended capability in protocol parser
Do not allow using eth and udp header types if non-extended pedit kABI
is being used. Other protocol parsers already have this check.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai
cdca191862 pedit: Do not allow using retain for too big fields
Using retain for fields longer than 32 bits is not supported.
Do not allow user to do it.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai
290cdc058d pedit: Fix a typo in warning
'ex' attribute should be placed after 'action pedit' and not after
'munge'.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Girish Moodalbail
01c659955a vxlan: Add support for modifying vxlan device attributes
Ability to change vxlan device attributes was added to kernel through
commit 8bcdc4f3a20b ("vxlan: add changelink support"), however one
cannot do the same through ip(8) command.  Changing the allowed vxlan
device attributes using 'ip link set dev <vxlan_name> type vxlan
<allowed_attributes>' currently fails with 'operation not supported'
error.  This failure is due to the incorrect rtnetlink message
construction for the 'ip link set' operation.

The vxlan_parse_opt() callback function is called for parsing options
for both 'ip link add' and 'ip link set'. For the 'add' case, we pass
down default values for those attributes that were not provided as CLI
options. However, for the 'set' case we should be only passing down the
explicitly provided attributes and not any other (default) attributes.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
2017-05-11 11:11:08 -07:00
David Ahern
aac40403ea ip: mpls: fix printing of mpls labels
If the kernel returns more labels than iproute2 expects, none of
the labels are printed and (null) is shown instead:
    $ ip -f mpls ro ls
    101 as to (null) via inet 172.16.2.2 dev virt12
    201 as to 202/203 via inet6 2001:db8:2::2 dev virt12

Remove the use of MPLS_MAX_LABELS and rely on buffer length that is
passed to mpls_ntop. With this change ip can print the label stack
returned by the kernel up to 255 characters (limit is due to size of
buf passed in) which amounts to 31 labels with a separator.

With this change the above is:
    $ ip/ip -f mpls ro ls
    101 as to 102/103/104/105/106/107/108/109/110 via inet 172.16.2.2 dev virt12

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-11 11:08:02 -07:00
Alexander Alemayhu
5be9971c73 tc: bpf: add ppc64 and sparc64 to list of archs with eBPF support
sparc64 support was added in 7a12b5031c6b (sparc64: Add eBPF JIT., 2017-04-17)[0]
and ppc64 in 156d0e290e96 (powerpc/ebpf/jit: Implement JIT compiler for extended BPF, 2016-06-22)[1].

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=7a12b5031c6b
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=156d0e290e96
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-08 23:05:35 -07:00
Or Gerlitz
e57285b81a tc: Reflect HW offload status
Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-05 09:49:25 -07:00
Stephen Hemminger
76557951f5 update kernel headers during 4.12 merge window
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-05 09:48:54 -07:00
Arkadi Sharshevsky
153c1a9b21 devlink: Add support for pipeline debug (dpipe)
Add support for pipeline debug (dpipe). The headers are used both the
gain visibillity into the headers supported by the hardware, and to
build the headers/field database which is used by other commands.

Examples:

First we can see the headers supported by the hardware:

$devlink dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta
  field:
    name erif_port bitwidth 32 mapping_type ifindex
    name l3_forward bitwidth 1
    name l3_drop bitwidth 1

Note that mapping_type is presented only if relevant. Also the header/
field id's are reported by the kernel they are not shown by default.
They can be observed by using the -v option. Also the headers scope
(global/local) is specified.

$devlink -v dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta id 0 global false
  field:
    name erif_port id 0 bitwidth 32 mapping_type ifindex
    name l3_forward id 1 bitwidth 1
    name l3_drop id 2 bitwidth 1

Second we can examine the tables supported by the hardware. In order
to dump all the tables no table name should be provided:
$devlink dpipe table show pci/0000:03:00.0

In order to examine specific table its name have to be specified:
$devlink dpipe table show pci/0000:03:00.0 name erif

pci/0000:03:00.0:
  name mlxsw_erif size 800 counters_enabled true
  match:
    type field_exact header mlxsw_meta field erif_port mapping ifindex
  action:
    type field_modify header mlxsw_meta field l3_forward
    type field_modify header mlxsw_meta field l3_drop

To enable/disable counters on the table:
$devlink dpipe table set pci/0000:03:00.0 name erif counters enable
$devlink dpipe table set pci/0000:03:00.0 name erif counters disable

In order to see the current entries in the hardware for specific table:
$devlink dpipe table dump pci/0000:03:00.0 name erif

pci/0000:03:00.0:
  index 0 counter 0
  match_value:
    type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 383 value 0
  action_value:
    type field_modify header mlxsw_meta field l3_forward value 1

  index 1 counter 0
  match_value:
    type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 381 value 1
  action_value:
    type field_modify header mlxsw_meta field l3_forward value 1

In the above example the table contains two entries which does match
on erif port and forwards the packet or drop it (currently only the
forward count is implemented). The counter values are provided for
example. In case the counting is not enabled on the table the counters
will not be available.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-03 09:29:43 -07:00
Arkadi Sharshevsky
4f10cede93 devlink: Change netlink attribute validation
Currently the netlink attribute resolving is done by a sequence of
if's. Change the attribute resolving to table lookup.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
2017-05-03 09:29:42 -07:00
Phil Sutter
6a78ef97b6 man: ip.8: Document -brief flag
Brief output is especially useful for new users, so at least mention
it's existence in ip man page.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-05-03 09:28:40 -07:00
Stephen Hemminger
5bffc12ef4 Merge branch 'net-next' 2017-05-03 09:28:10 -07:00
Stephen Hemminger
cbc7929b21 v4.11.0 2017-05-01 09:32:25 -07:00
Boris Pismenny
cfd2e727f0 ip xfrm: Add xfrm state crypto offload
syntax:
ip xfrm state .... offload dev <if-name> dir <in or out>

Example to add inbound offload:
  ip xfrm state .... offload dev mlx0 dir in
Example to add outbound offload:
  ip xfrm state .... offload dev mlx0 dir out

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
2017-05-01 09:30:25 -07:00
Daniel Borkmann
a872b870a5 bpf: add support for generic xdp
Follow-up to commit c7272ca720 ("bpf: add initial support for
attaching xdp progs") to also support generic XDP. This adds an
indicator for loaded generic XDP programs when programs are loaded
as shown in c7272ca720, but the driver still lacks native XDP
support.

  # ip link
  [...]
  3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc [...]
      link/ether 0c:c4:7a:03:f9:25 brd ff:ff:ff:ff:ff:ff
  [...]

In case the driver does support native XDP, but the user wants
to load the program as generic XDP (e.g. for testing purposes),
then this can be done with the same semantics as in c7272ca720,
but with 'xdpgeneric' instead of 'xdp' command for loading:

  # ip -force link set dev eno1 xdpgeneric obj xdp.o

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
2017-05-01 09:28:19 -07:00
Stephen Hemminger
7ff1fce549 update headers to 4.11 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:27:46 -07:00
Stephen Hemminger
d2b9100a08 Merge branch 'master' into net-next 2017-05-01 09:26:51 -07:00
Stephen Hemminger
1e600da057 pedit: fix whitespace
Add newlines to break long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:25:22 -07:00
Or Gerlitz
3d2a7781ec tc/pedit: p_udp: introduce pedit udp support
For example, forward udp traffic destined to port 999 to veth0 and set
tcp port to 888:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 999 \
    action pedit ex munge \
      udp dport set 888 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
2c6eb12ab8 tc/pedit: p_tcp: introduce pedit tcp support
For example, forward tcp traffic destined to port 80 to veth0 and set
tcp port to 8080:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
      dst_port 80 \
    action pedit ex munge \
      tcp dport set 8080 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
3cd5149ecd tc/pedit: p_eth: ETH header editor
For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      eth dst set 11:22:33:44:55:66 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
fa4652ff3b tc/pedit: Support fields bigger than 32 bits
Make parse_val() accept fields up to 128 bits long, this should be
enough for current use cases and involves a minimal change to code.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
8d193d9607 tc/pedit: p_ip: introduce editing ttl header
Enable user to edit IP header ttl field.

For example, to forward any TCP packet and decrease its TTL by one:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      ip ttl add 0xff pipe \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
c05ddaf9e0 tc/pedit: Introduce 'add' operation
This command could be useful to increase/decrease fields value.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
7c71a40cbd tc/pedit: Extend pedit to specify offset relative to mac/transport headers
Utilize the extended pedit netlink to set an offset relative to a
specific header type. Old netlink only enabled the user to set
approximated  offset relative to the IPv4 header.

To use this extended functionality need to use the 'ex' keyword after
'pedit' and before any 'munge'.
e.g:
$ tc filter add dev ens9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 80 \
    action pedit ex munge \
      ip dst set 1.1.1.1 \
      pipe \
    action mirred egress redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
51536ebbe8 tc/pedit: Fix a typo in pedit usage message
Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Stephen Hemminger
bb6ab47b16 iplink: whitespace cleanup
Break lines to conform to 80 col guideline.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:13:09 -07:00
Zhang Shengju
432b92a702 iplink: add support for IFLA_CARRIER attribute
Add support to set IFLA_CARRIER attribute.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2017-05-01 09:06:54 -07:00
Michal Kubeček
6ec14f1abb routel: fix infinite loop in line parser
As noticed by one of the few users of routel script, it ends up in an
infinite loop when they pull out the cable from the NIC used for some
route. This is caused by its parser expecting the line of "ip route show"
output consists of "key value" pairs (except for the initial target range),
together with an old trap of Bourne style shells that "shift 2" does
nothing if there is only one argument left. Some keywords, e.g. "linkdown",
are not followed by a value.

Improve the parser to

  (1) only set variables for keywords we care about
  (2) recognize (currently) known keywords without value

This is still far from perfect (and certainly not future proof) but to
fully fix the script, one would probably have to rewrite the logic
completely (and I'm not sure it's worth the effort).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-04-27 16:42:29 -07:00
Phil Sutter
843fc90068 man: ip-rule.8: Further clarify how to interpret priority value
Despite the past changes, users seemed to get confused by the seemingly
contradictory relation of priority value and actual rule priority.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-24 11:43:09 -07:00
Craig Gallek
ad4b1425c3 iplink: Expose IFLA_*_FWMARK attributes for supported link types
This attribute allows the administrator to adjust the packet marking
attribute of tunnels that support policy based routing.

Signed-off-by: Craig Gallek <kraig@google.com>
2017-04-23 09:14:46 -07:00
Stephen Hemminger
590dde3a98 Merge branch 'master' into net-next 2017-04-23 09:14:35 -07:00
Craig Gallek
35893864c8 gre6: fix copy/paste bugs in GREv6 attribute manipulation
Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Craig Gallek <kraig@google.com>
2017-04-23 09:13:07 -07:00
Jamal Hadi Salim
fd8b3d2c1b actions: Add support for user cookies
Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 1 bind 0 installed 5 sec used 5 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-04-23 09:10:02 -07:00
Stephen Hemminger
0e3cdd9ce0 remove unused header file sysctl.h
Not referred to in current source tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-21 17:47:30 -07:00
Stephen Hemminger
5b0aa88737 update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-21 17:41:33 -07:00
David Lebrun
e1b7f883e5 man: add documentation for IPv6 SR commands
This patch adds information about seg6 encapsulation in the ip-route
manual, as well as the ip-sr manual page.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
David Lebrun
e8493916a8 iproute: add support for SR-IPv6 lwtunnel encapsulation
This patch adds support for SEG6 encapsulation type
("ip route add ... encap seg6 ...").

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
David Lebrun
9386332823 ip: add ip sr command to control SR-IPv6 internal structures
This patch adds commands to support the tunnel source properties
("ip sr tunsrc") and the HMAC key -> secret, algorithm binding
("ip sr hmac").

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
Stephen Hemminger
85dd6ab510 add seg6.h kernel headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-16 10:21:34 -07:00
Stephen Hemminger
2c6a0636e2 Update kernel headers from 4.11 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-16 10:19:32 -07:00
David Ahern
0da8250be8 ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:

    $ ip vrf pids mgmt
     1121  ntpd
     1418  gdm-session-wor
     1488  gnome-session
     1491  dbus-launch
     1492  dbus-daemon
     1565  sshd
     ...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-16 10:19:32 -07:00
David Ahern
f443565f8d ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:

    $ ip vrf pids mgmt
     1121  ntpd
     1418  gdm-session-wor
     1488  gnome-session
     1491  dbus-launch
     1492  dbus-daemon
     1565  sshd
     ...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-16 10:06:33 -07:00
David Ahern
c6858ef431 ip netconf: show all families on dev request
Currently specifying a device to ip netconf and it dumps only values
for IPv4. Change this to dump data for all families unless a specific
family is given.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
David Ahern
f052f5dfe0 ip netconf: Show all address families by default in dumps
Currently, 'ip netconf' only shows ipv4 and ipv6 netconf settings. If IPv6
is not enabled, the dump ends with
    RTNETLINK answers: Operation not supported

when IPv6 request is attempted. Further, if the mpls_router module is also
loaded a separate request is needed to get MPLS settings.

To make this better going forward, use the new PF_UNSPEC dump all option
if the kernel supports it. If the kernel does not, it sets NLMSG_ERROR and
returns EOPNOTSUPP which is trapped and we fall back to the existing output
to maintain compatibility with existing kernels.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
David Ahern
3ad6d17638 netlink: Add flag to suppress print of nlmsg error
Allow callers of the dump API to handle nlmsg errors (e.g., an
unsupported feature). Setting RTNL_HANDLE_F_SUPPRESS_NLERR in the
rtnl_handle avoids unnecessary messages to the users in some case.
For example,

  RTNETLINK answers: Operation not supported

when probing for support of a new feature.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
Stephen Hemminger
dfb60ddd29 Merge branch 'master' into net-next 2017-04-14 15:59:12 -07:00
Stephen Hemminger
2d3af1675d netem: fix out of bounds access in maketable
The maketable program used to generate one of the configuration
files at build time for netem would access past the end of the array
for one input value. This is a bug inherited from original NISTnet.
Just fold the value, like other code there.

This is not a runtime error security problem.
It only impacts the build process if the build machine
had extra hardening enabled.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-12 10:13:14 -07:00
Robert Shearman
9688cf3b7a iproute: Add support for MPLS LWT ttl attribute
Add support for setting and displaying the ttl attribute
for MPLS IP lighweight tunnels.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-12 10:02:15 -07:00