Commit Graph

1094 Commits

Author SHA1 Message Date
Luca Boccassi
508f3c231e Use libbsd for strlcpy if available
If libc does not provide strlcpy check for libbsd with pkg-config to
avoid relying on inline version.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-01 12:47:03 -07:00
David Ahern
6e221408e6 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-23 10:55:09 -07:00
Phil Sutter
737b8258b3 tc: htb: Print default value in hex
Value of 'default' is assumed to be hexadecimal when parsing, so
consequently it should be printed in hex as well. This is a regression
introduced when adding JSON output.

As requested, also change JSON output to print the value as hex string.

Fixes: f354fa6aa5 ("tc: jsonify htb qdisc")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-23 10:07:10 -07:00
Phil Sutter
6358bbc381 tc: Remove pointless assignments in batch()
All these assignments are later overwritten without reading in between,
so just drop them.

Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
David Ahern
cd554f2c2f Tree wide: Drop sockaddr_nl arg
No function, filter, or print function uses the sockaddr_nl arg,
so just drop it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 09:43:48 -07:00
Stephen Hemminger
f5a398bf17 tc: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:22:51 -07:00
David Ahern
0d30c1f8d4 Merge branch 'master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-13 19:31:37 -07:00
Jakub Kicinski
650a10e032 tc: jsonify output of q_fifo
Print limits correctly in JSON context.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-08 09:22:22 -07:00
Vinicius Costa Gomes
0dd1644935 tc: Add support for configuring the taprio scheduler
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
          num_tc 3 \
	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	  queues 1@0 1@1 2@2 \
	  base-time 1528743495910289987 \
	  sched-entry S 01 300000 \
	  sched-entry S 02 300000 \
	  sched-entry S 04 300000 \
	  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

     sched-entry <CMD> <GATE MASK> <INTERVAL>

The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

 	      base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:32:08 -07:00
Vlad Buslov
f6b498f957 tc: flower: expose hardware offload count
Recently flower classifier was updated to expose count of devices that
filter is offloaded to. Add support to print this counter as 'in_hw_count'.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-10-07 10:14:09 -07:00
Eelco Chaudron
5ac138324e tc_util: Add support for showing TCA_STATS_BASIC_HW statistics
Add support for showing hardware specific counters to easy
troubleshooting hardware offload.

$ tc -s filter show dev enp3s0np0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
        action order 1: mirred (Egress Redirect to device eth1) stolen
        index 1 ref 1 bind 1 installed 0 sec used 0 sec
        Action statistics:
        Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 187542 bytes 4077 pkt
        Sent hardware 534697200 bytes 8911620 pkt
        backlog 0b 0p requeues 0
        cookie 89173e6a44447001becfd486bda17e29

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 14:45:33 -07:00
Pieter Jansen van Vuuren
56155d4df8 tc: f_flower: add geneve option match support to flower
Allow matching on options in Geneve tunnel headers.

The options can be described in the form
CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is
represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.

e.g.
 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev geneve0 ingress
 # tc filter add dev geneve0 protocol ip parent ffff: \
     flower \
       enc_src_ip 10.0.99.192 \
       enc_dst_ip 10.0.99.193 \
       enc_key_id 11 \
       geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \
       ip_proto udp \
       action mirred egress redirect dev eth1

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 14:39:55 -07:00
David Ahern
34212c73b7 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	ip/iproute_lwtunnel.c

In addition to merge conflict between bd59e5b151 and 94a8722f2f,
updated the code added by the latter commit based on the change of the
former (ie., added ret = to the new rta_addattr_l).

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-20 17:53:27 -07:00
Toke Høiland-Jørgensen
2153e01f36 q_cake: Also print nonat, nowash and no-ack-filter keywords
Similar to the previous patch for no-split-gso, the negative keywords for
'nat', 'wash' and 'ack-filter' were not printed either. Add those well.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-14 11:32:46 -07:00
Toke Høiland-Jørgensen
b914fe5f1c q_cake: Add printing of no-split-gso option
When the GSO splitting was turned into dual split-gso/no-split-gso options,
the printing of the latter was left out. Add that, so output is consistent
with the options passed.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-12 12:59:38 -07:00
Stephen Hemminger
b85076cd74 lib: introduce print_nl
Common pattern in iproute commands is to print a line seperator
in non-json mode. Make that a simple function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-11 08:29:33 -07:00
Caleb Raitto
40c2916fda tc/mqprio: Print extra info on invalid args.
Print the name of the argument that wasn't understood.

Signed-off-by: Caleb Raitto <caraitto@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 12:14:00 -07:00
Stephen Hemminger
ad618b7984 tc/fifo: remove unnecessary prototype
The prototype for prio_print_opt is already in tc_util.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:50:22 -07:00
Yousuk Seung
588dd51e2c q_netem: slotting with non-uniform distribution
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.

Syntax:
   slot distribution DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
      [bytes MAX_BYTES]

The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.

Examples:
  Normal distribution delay with mean = 800us and stdev = 100us.
  > tc qdisc add dev eth0 root netem slot distribution normal \
    800us 100us

  Optionally set the max slot size in bytes and/or packets.
  > tc qdisc add dev eth0 root netem slot distribution normal \
    800us 100us bytes 64k packets 42

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:08:19 -07:00
Dave Taht
b6268fbd58 q_netem: support delivering packets in delayed time slots
Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.

It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.

The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.

The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.

Examples of use:

tc qdisc add dev eth0 root netem delay 200us \
        slot 800us 10ms bytes 64k packets 42

A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:

tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
         limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
         slot 800us 10ms bytes 64k packets 42 limit 512

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:07:46 -07:00
Dave Taht
abf70ef494 tc: support conversions to or from 64 bit nanosecond-based time
Using a 32 bit field to represent time in nanoseconds results in a
maximum value of about 4.3 seconds, which is well below many observed
delays in WiFi and LTE, and barely in the ballpark for a trip past the
Earth's moon, Luna.

Using 64 bit time fields in nanoseconds allows us to simulate
network diameters of several hundred light-years. However, only
conversions to and from ns, us, ms, and seconds are provided.

The iproute2 64 bit api uses signed values for time. Being able to
represent positive or negative time allows us to calculate +/- deltas
between, for example, the CLOCK_TAI and CLOCK_REALTIME clocks.

Time related utility functions in tc_util.c are moved to lib/utils.c.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:04:38 -07:00
Florent Fourcot
2bfe28710e tc/htb: remove unused variable
Since introduction of htb module, this variable has never been used.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 08:00:45 -07:00
Mahesh Bandewar
5d5586b058 iproute: make clang happy
These are primarily fixes for "string is not string literal" warnings
/ errors (with -Werror -Wformat-nonliteral). This should be a no-op
change. I had to replace couple of print helper functions with the
code they call as it was becoming harder to eliminate these warnings,
however these helpers were used only at couple of places, so no
major change as such.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:58:09 -07:00
Stephen Hemminger
a8e9f4ae14 tc: drop extern from function prototypes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 16:01:31 -07:00
Phil Sutter
ff1ab8edf8 Make colored output configurable
Allow for -color={never,auto,always} to have colored output disabled,
enabled only if stdout is a terminal or enabled regardless of stdout
state.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 08:54:06 -07:00
Phil Sutter
4d82962ccc Merge common code for conditionally colored output
Instead of calling enable_color() conditionally with identical check in
three places, introduce check_enable_color() which does it in one place.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:55:27 -07:00
Phil Sutter
0d0e0e0bef tc: Fix typo in check for colored output
The check used binary instead of boolean AND, which means colored output
was enabled only if the number of specified '-color' flags was odd.

Fixes: 2d165c0811 ("tc: implement color output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:54:32 -07:00
Nishanth Devarajan
141b55f854 Add SKB Priority qdisc support in tc(8)
sch_skbprio is a qdisc that prioritizes packets according to their skb->priority
field. Under congestion, it drops already-enqueued lower priority packets to
make space available for higher priority packets. Skbprio was conceived as a
solution for denial-of-service defenses that need to route packets with
different priorities as a means to overcome DoS attacks.

Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-14 07:06:43 -07:00
David Ahern
c044be6b34 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-13 07:47:21 -07:00
Toke Høiland-Jørgensen
23a67b008a sch_cake: Make gso-splitting configurable
This patch makes sch_cake's gso/gro splitting configurable
from userspace.

To disable breaking apart superpackets in sch_cake:

tc qdisc replace dev whatever root cake no-split-gso

to enable:

tc qdisc replace dev whatever root cake split-gso

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-13 07:41:44 -07:00
Keara Leibovitz
e8bd395508 tc: fix bugs for tcp_flags and ip_attr hex output
Fix hex output for both the ip_attr and tcp_flags print functions.

Sample usage:

$ $TC qdisc add dev lo ingress
$ $TC filter add dev lo parent ffff: prio 3 proto ip flower ip_tos 0x8/32
$ $TC fitler add dev lo parent ffff: prio 5 proto ip flower ip_proto tcp \
	tcp_flags 0x909/f00

$ $TC filter show dev lo parent ffff:

filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
  eth_type ipv4
  ip_tos 0x8/32
  not_in_hw
filter protocol ip pref 5 flower chain 0
filter protocol ip pref 5 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  tcp_flags 0x909/f00
  not_in_hw

$ $TC -j filter show dev lo parent ffff:

[{
    "protocol":"ip",
    "pref":3,
    "kind":"flower",
    "chain":0
},{
    "protocol":"ip",
    "pref":3,
    "kind":"flower",
    "chain":0,
    "options": {
	"handle":1,
	"keys": {
	    "eth_type":"ipv4",
	    "ip_tos":"0x8/32"
    },
    "not_in_hw":true
    }
},{
    "protocol":"ip",
    "pref":5,
    "kind":"flower",
    "chain":0
},{
    "protocol":"ip",
    "pref":5,
    "kind":"flower",
    "chain":0,
    "options": {
	"handle":1,
	"keys": {
	    "eth_type":"ipv4",
	    "ip_proto":"tcp",
	    "tcp_flags":"0x909/f00"
	},
	"not_in_hw":true
    }
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:04:00 -07:00
Stephen Hemminger
d66fdfda71 tc: flush after each command in batch mode
After each command flush output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-08 09:23:48 -07:00
David Ahern
a0bc57e1ef Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	include/uapi/linux/bpf.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 10:08:04 -07:00
Jiri Pirko
afcd06991d tc: introduce support for chain templates
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 10:00:28 -07:00
Or Gerlitz
761ec9e29f tc/flower: Add match on encapsulating tos/ttl
Add matching on tos/ttl of the IP tunnel headers.

For example, here's decap rule that matches on the tunnel tos:

tc filter add dev vxlan_sys_4789 protocol ip parent ffff: prio 10 flower \
   enc_src_ip 192.168.10.2 enc_dst_ip 192.168.10.1 enc_key_id 100 enc_dst_port 4789 enc_tos 0x30 \
   src_mac e4:11:22:33:44:70 dst_mac e4:11:22:33:44:50  \
   action tunnel_key unset \
   action mirred egress redirect dev eth0_0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:59:11 -07:00
Or Gerlitz
9f89b0cc0e tc/act_tunnel_key: Enable setup of tos and ttl
Allow to set tos and ttl for the tunnel.

For example, here's encap rule that sets tos to the tunnel:

tc filter add dev eth0_0 protocol ip parent ffff: prio 10 flower \
   src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:70 \
   action tunnel_key set src_ip 192.168.10.1 dst_ip 192.168.10.2 id 100 dst_port 4789 tos 0x30 \
   action mirred egress redirect dev vxlan_sys_4789

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:58:31 -07:00
Toke Høiland-Jørgensen
77c9fbd06e q_cake: Rename autorate_ingress parameter to use dash as word separator
This is consistent with the other multi-word parameters. Also change the
JSON output to be consistent with way it is formatted for the other
options.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:46:42 -07:00
Jesus Sanchez-Palencia
b625e36108 tc: Do not use addattr_nest_compat on mqprio and netem
Here we are partially reverting commit c14f9d92ee
"treewide: Use addattr_nest()/addattr_nest_end() to handle nested
attributes" .

As discussed in [1], changing from the 'manually' coded version that
used addattr_l() to addattr_nest_compat() wasn't functionally
equivalent, because now the messages have extra fields appended to it.

This introduced a regression since the implementation of parse_attr()
from both mqprio and netem can't handle this new message format.

Without this fix, mqprio returns an error. netem won't return an error
but its internal configuration ends up wrong.

As an example, this can be reproduced by the following commands when
this patch is not applied:

 1) mqprio
$ tc qdisc replace dev enp3s0 parent root handle 100 mqprio \
	num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	queues 1@0 1@1 2@2 hw 0

RTNETLINK answers: Numerical result out of range

 2) netem
$ tc qdisc add dev enp3s0 root netem rate 5kbit 20 100 5 \
	distribution normal latency 1 1

$ tc -s qdisc

(...)
qdisc netem 8001: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us
 Sent 402 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

With this patch applied, the tc -s qdisc command above for netem instead
reads:

(...)
qdisc netem 8002: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us \
	rate 5Kbit packetoverhead 20 cellsize 100 celloverhead 5
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

[1] https://patchwork.ozlabs.org/patch/867860/#1893405

Fixes: c14f9d92ee ("treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes")
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-19 15:50:07 -07:00
Toke Høiland-Jørgensen
714444c0cb Add support for CAKE qdisc
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort

Cake is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Tony Ambardar, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:23:46 -07:00
Qiaobin Fu
697dce7b3a net:sched: add action inheritdsfield to skbedit
The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v4:
* Make tc use netlink helper functions

v3:
* Make flag represented in JSON output as a null value

v2:
* Align the output syntax with the input syntax

* Fix the style issues

Original idea by Jamal Hadi Salim <jhs@mojatatu.com>

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:17:56 -07:00
Jianbo Liu
1f0a5dfd38 tc: flower: Add support for QinQ
To support matching on both outer and inner vlan headers,
we add new cvlan_id/cvlan_prio/cvlan_ethtype for inner vlan header.

Example:
# tc filter add dev eth0 protocol 802.1ad parent ffff: \
    flower vlan_id 1000 vlan_ethtype 802.1q \
        cvlan_id 100 cvlan_ethtype ipv4 \
    action vlan pop \
    action vlan pop \
    action mirred egress redirect dev eth1

# tc filter show dev eth0 ingress
filter protocol 802.1ad pref 1 flower chain 0
filter protocol 802.1ad pref 1 flower chain 0 handle 0x1
  vlan_id 1000
  vlan_ethtype 802.1Q
  cvlan_id 100
  cvlan_ethtype ip
  eth_type ipv4
  in_hw

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:03:50 -07:00
Vinicius Costa Gomes
7da5ef2200 tc: Add support for the ETF Qdisc
The "Earliest TxTime First" (ETF) queueing discipline allows precise
control of the transmission time of packets by providing a sorted
time-based scheduling of packets.

The syntax is:

tc qdisc add dev DEV parent NODE etf delta <DELTA>
                     clockid <CLOCKID> [offload] [deadline_mode]

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:50:10 -07:00
Stephen Hemminger
b49759c0e7 tc: don't double print rate
Conversion to print stats in JSON forgot to remove existing
fprintf.

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-09 09:53:45 -07:00
fumihiko kakuma
d529ea2ff4 tc: Fix the bug not to display prio and quantum options of htb
A commandline like 'tc -d class show dev dev-name' does not
display value of prio and quantum option when we use htb qdisc.
This patch fixes the bug.

Signed-off-by: Fumihiko Kakuma <kakuma@valinux.co.jp>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
Roi Dayan
425dcc2741 tc: Fix output of ip attributes
Example output is of tos and ttl.
Befoe this fix the format used %x caused output of the pointer
instead of the intended string created in the out variable.

Fixes: e28b88a464 ("tc: jsonify flower filter")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
Simon Horman
6217917a38 tc: m_tunnel_key: Add tunnel option support to act_tunnel_key
Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 09:10:05 -07:00
Keara Leibovitz
4757a54799 tc: jsonify nat action
Add json output support for nat action

Example output:

~$ $TC actions add action nat egress 10.10.10.1 20.20.20.2 index 2
~$ $TC actions add action nat ingress 100.100.100.1/32 200.200.200.2 \
	continue index 99
~$ $TC -j actions ls action nat

[{
	"total acts": 2
}, {
	"actions": [{
		"order": 0,
		"type": "nat",
		"direction": "egress",
		"old_addr": "10.10.10.1/32",
		"new_addr": "20.20.20.2",
		"control_action": {
			"type": "pass"
		},
		"index": 2,
		"ref": 1,
		"bind": 0
	}, {
		"order": 1,
		"type": "nat",
		"direction": "ingress",
		"old_addr": "100.100.100.1/32",
		"new_addr": "200.200.200.2",
		"control_action": {
			"type": "continue"
		},
		"index": 99,
		"ref": 1,
		"bind": 0
	}]
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-20 10:20:34 -07:00
Vlad Buslov
b133392468 tc: fix batch force option
When sending accumulated compound command results an error, check 'force'
option before exiting. Move return code check after putting batch bufs and
freeing iovs to prevent memory leak. Break from loop, instead of returning
error code to allow cleanup at the end of batch function. Don't reset ret
code on each iteration.

Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-20 09:32:36 -07:00
Keara Leibovitz
831b5d40d9 tc: add json support in csum action
Add json output support for checksum action.

Example output:

~$ $TC actions add action csum udp continue index 7
~$ $TC actions add action csum icmp iph igmp pipe index 200 cookie 112233
~$ $TC -j actions ls action csum

[{
    "total acts":2
}, {
    "actions": [{
        "order":0,
        "csum":"udp",
        "control_action": {
            "type":"continue"
        },
        "index":7,
        "ref":1,
        "bind":0
    }, {
        "order":1,
        "csum":"iph, icmp, igmp",
        "control_action": {
            "type":"pipe"
        },
        "index":200,
        "ref":1,
        "bind":0,
        "cookie":"112233"
    }]
}]

v2:
    Don't initialized char buf[64];
    Add output example

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-05 15:30:30 -07:00
Roman Mashak
53d34eb66c tc: add missing space symbol in ife output
In order to make TDC tests match the output patterns, the missing space
character must be added in the mode output string.

Fixes: 8744c5d338 ("tc: jsonify ife action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:10:48 -07:00
Marcelo Ricardo Leitner
ac6a4c2299 tc: flower: add support for verbose logging
Currently there is no way to log offloading errors if the rule is not
explicitly marked as skip_sw, making it hard for other applications such
as Open vSwitch to log why a given could not be offloaded.

This patch adds support for signaling the kernel that more verbose
logging is wanted, which now will include such messages.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:06:04 -07:00
David Ahern
7732148d1d Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-05 11:07:47 -07:00
Toke Høiland-Jørgensen
bf717756b5 ingress: Don't break JSON output
The dash printed by the ingress qdisc breaks JSON output, so only print it
in regular output mode.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 11:08:39 -07:00
David Ahern
0d93d1e736 Merge branch 'master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:21 -07:00
Roman Mashak
0aaf62fcb6 tc: return on invalid smac or dmac in ife action
Return on invalid smac/dmac and use invarg consistently for invalid
arguments report.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-04-20 10:35:21 -07:00
Stephen Hemminger
0b01f088ee flower: use 16 bit format where possible
Should use print_hu not print_uint for 16 bit value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:35:00 -07:00
Roman Mashak
8744c5d338 tc: jsonify ife action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-15 17:23:17 -07:00
Roman Mashak
7b17701717 tc: jsonify skbedit action
v2:
   FIxed strings format in print_string()

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-15 17:09:16 -07:00
Roman Mashak
8feb516bfc tc: jsonify tunnel_key action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-08 10:52:33 -07:00
Roman Mashak
1d3c91a7c4 tc: jsonify connmark action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-08 10:52:32 -07:00
Yuval Mintz
0927bf83e7 tc: Correct json output for actions
Commit 9fd3f0b255 ("tc: enable json output for actions") added JSON
support for tc-actions at the expense of breaking other use cases that
reach tc_print_action(), as the latter don't expect the 'actions' array
to be a new object.

Consider the following taken duringrun of tc_chain.sh selftest,
and see the latter command output is broken:

$ ./tc/tc -j -p actions list action gact | grep -C 3 actions
[ {
        "total acts": 1
    },{
        "actions": [ {
                "order": 0,

$ ./tc/tc -p -j -s filter show dev enp3s0np2 ingress | grep -C 3 actions
            },
            "skip_hw": true,
            "not_in_hw": true,{
                "actions": [ {
                        "order": 1,
                        "kind": "gact",
                        "control_action": {

Relocate the open/close of the JSON object to declare the object only
for the case that needs it.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Tested-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 16:41:36 -07:00
David Ahern
2c62a64d60 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	bridge/mdb.c
	misc/ss.c
	tc/tc.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-02 10:47:34 -07:00
Roman Mashak
7ada016aeb tc: jsonify sample action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:44:31 -07:00
Roman Mashak
c2f60f5c8e tc: support oneline mode in action generic printer functions
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:37:32 -07:00
Roman Mashak
9fd3f0b255 tc: enable json output for actions
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-30 08:55:17 -07:00
Roman Mashak
6e8634eb13 tc: add oneline mode
Add initial support for oneline mode in tc; actions, filters and qdiscs
will be gradually updated in the follow-up patches.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-30 08:18:58 -07:00
Stephen Hemminger
d5732e3470 ematch: fix possible snprintf overflow
Fixes gcc 8 warning about possible snprint overflow

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:32:43 -07:00
Stephen Hemminger
b8a6088e13 tc_class: fix snprintf warning
Size buffer big enough to avoid any possible overflow.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:32:43 -07:00
Stephen Hemminger
95744efac4 pedit: fix strncpy warning
Newer versions of Gcc warn about string truncation.
Fix by using strlcpy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:28 -07:00
Roman Mashak
d64a22f393 tc: print index, refcnt & bindcnt for nat action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-03-27 17:00:32 -07:00
Stephen Hemminger
fec62c0ec7 tc: help and whitespace cleanup
Break long lines, and cleanup usage message.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 15:33:13 -07:00
David Ahern
54eae5f76d Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-27 12:33:02 -07:00
Roman Mashak
990b1d90d7 tc: print actual action for connmark action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-03-27 09:03:15 -07:00
Roi Dayan
17504be81d tc: Fix compilation error with old iptables
The compat_rev field does not exists in old versions of iptables.
e.g. iptables 1.4.

Fixes: dd29621578 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-27 06:38:52 -07:00
Roman Mashak
bf7d148803 tc: use get_u32() in psample action to match types
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-16 13:38:50 -07:00
Roman Mashak
e9fa16583a tc: print actual action for sample action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-16 13:38:38 -07:00
Toke Høiland-Jørgensen
997f2dc193 tc: Add JSON output of fq_codel stats
Enable proper JSON output support for fq_codel in `tc -s qdisc` output.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:05:40 -07:00
Toke Høiland-Jørgensen
d7d044ff53 tc: Add missing documentation for codel and fq_codel parameters
Add missing documentation of the memory_limit fq_codel parameter and the
ce_threshold codel and fq_codel parameters.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:05:35 -07:00
Pieter Jansen van Vuuren
fb4e6abfca tc: f_flower: Add support for matching first frag packets
Add matching support for distinguishing between first and later fragmented
packets.

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
	ip_flags firstfrag \
        ip_proto udp \
    action mirred egress redirect dev eth1

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
	ip_flags nofirstfrag \
        ip_proto udp \
    action mirred egress redirect dev eth1

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:03:21 -07:00
David Ahern
e9625d6aea Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	bridge/mdb.c

Updated bridge/bridge.c per removal of check_if_color_enabled by commit
1ca4341d2c ("color: disable color when json output is requested")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 17:48:10 -07:00
Serhey Popovych
fe99adbca4 utils: Introduce and use nodev() helper routine
There is a couple of places where we report error in case of no network
device is found. In all of them we output message in the same format to
stderr and either return -1 or 1 to the caller or exit with -1.

Introduce new helper function nodev() that takes name of the network
device caused error and returns -1 to it's caller. Either call exit()
or return to the caller to preserve behaviour before change.

Use -nodev() in traffic control (tc) code to return 1.

Simplify expression for checking for argument being 0/NULL in @if
statement.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:58:36 -07:00
Davide Caratti
75ef7b18d2 tc: fix parsing of the control action
If the user didn't specify any control action, don't pop the command line
arguments: otherwise, parsing of the next argument (tipically the 'index'
keyword) results in an error, causing the following 'tc-testing' failures:

 Test a6d6: Add skbedit action with index
 Test 38f3: Delete skbedit action
 Test a568: Add action with ife type
 Test b983: Add action without ife type
 Test 7d50: Add skbmod action to set destination mac
 Test 9b29: Add skbmod action to set source mac
 Test e93a: Delete an skbmod action

Also, add missing parse for 'ok' control action to m_police, to fix the
following 'tc-testing' failure:

 Test 8dd5: Add police action with control ok

tested with:
 # ./tdc.py

test results:
 all tests ok using kernel 4.16-rc2, except 9aa8 "Get a single skbmod
 action from a list" (which is failing also before this commit)

Fixes: 3572e01a09 ("tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()")
Cc: Michal Privoznik <mprivozn@redhat.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-04 09:01:38 -08:00
Eyal Birger
dd29621578 tc: add em_ipt ematch for calling xtables matches from tc matching context
The commit calls a new tc ematch for using netfilter xtable matches.

This allows early classification as well as mirroning/redirecting traffic
based on logic implemented in netfilter extensions.

Current supported use case is classification based on the incoming IPSec
state used during decpsulation using the 'policy' iptables extension
(xt_policy).

The matcher uses libxtables for parsing the input parameters.

Example use for matching an IPSec state with reqid 1:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: \
    basic match 'ipt(-m policy --dir in --pol ipsec --reqid 1)' \
    action drop

This is the user-space counter part of kernel commit ccc007e4a746
("net: sched: add em_ipt ematch for calling xtables matches")

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:43:16 -08:00
Eyal Birger
526862038e tc: ematch: add parse_eopt_argv() method for providing ematches with argv parameters
ematche uses YACC to parse ematch arguments and places them in struct bstr
linked lists.

It is useful to be able to receive parameters as argc,argv in order to use
getopt (and alike) argument parsers.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:43:06 -08:00
Adam Vyskovsky
2fb854d07c tc: fix an off-by-one error while printing tc actions
The tc_print_action() function did not print all tc actions
when e.g. TCA_ACT_MAX_PRIO actions were defined for a single
tc filter.

Signed-off-by: Adam Vyskovsky <adamvyskovsky@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:29 -08:00
Stephen Hemminger
2d165c0811 tc: implement color output
Implement the -color option; in this case -co is ambiguous
since it was already used for -conf.
For now this just means putting device name in color.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 09:12:28 -08:00
Serhey Popovych
5433656705 ip: Use single variable to represent -pretty
After commit a233caa0aa ("json: make pretty printing optional") I get
following build failure:

    LINK     rtmon
    ../lib/libutil.a(json_print.o): In function `new_json_obj':
    json_print.c:(.text+0x35): undefined reference to `show_pretty'
    collect2: error: ld returned 1 exit status
    make[1]: *** [rtmon] Error 1
    make: *** [all] Error 2

It is caused by missing show_pretty variable in rtmon.

On the other hand tc/tc.c there are two distinct variables and single
matches() call that handles -pretty option thus setting show_pretty
will never happen. Note that since commit 44dcfe8201 ("Change
formatting of u32 back to default") show_pretty is used in tc/f_u32.c
so this is first place where -pretty introduced.

Furthermore other utilities like misc/ifstat.c and misc/nstat.c define
pretty variable, however only for their own purposes. They both support
JSON output and thus depend show_pretty in new_json_obj().

Assuming above use common variable to represent -pretty option, define
it in utils.c and declare in utils.h that is commonly used. Replace
show_pretty with pretty.

Fixes: a233caa0aa ("json: make pretty printing optional")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:13:36 -08:00
Stephen Hemminger
a233caa0aa json: make pretty printing optional
Since JSON is intended for programmatic consumption, it makes
sense for the default output format to be concise as possible.

For programmer and other uses, it is helpful to keep the pretty
whitespace format; therefore enable it with -p flag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:15:08 -08:00
Serhey Popovych
c14f9d92ee treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes
We have helper routines to support nested attribute addition into
netlink buffer: use them instead of open coding.

Use addattr_nest_compat()/addattr_nest_compat_end() where appropriate.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-02 15:01:09 -08:00
David Ahern
1e24e773f1 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-29 08:24:57 -08:00
Jakub Kicinski
44c7655186 tc: fix second printing of requeues
Non-JSON tc qdisc output used to print the "requeues" statistic
twice.  Commit 4fcec7f366 ("tc: jsonify stats2") tried to preserve
this behaviour for both standard output and JSON, but used the wrong
statistic (q.qlen).  Also duplicating keys in JSON is not allowed,
so the second occurrence should be completely skipped with JSON.

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-27 16:06:54 -08:00
Jakub Kicinski
c061b75895 tc: prio: JSON-ify prio output
Make JSON output work with prio Qdiscs.  This will also make
other qdiscs which reuse the print_qopt work, like mqprio or
pfifo_fast.

Note that there is a double space between "priomap" and first
prio number.  Keep this original behaviour.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-26 13:00:18 -08:00
Jakub Kicinski
097415d510 tc: red: JSON-ify RED output
Make JSON output work with RED Qdiscs.  Float/double printing
helpers have to be added/uncommented to print the probability.
Since TC stats in general are not split out to a separate object
the xstats printed by this patch are not separated either.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-26 12:59:55 -08:00
David Ahern
6517b5c0ac Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 09:59:03 -08:00
Wolfgang Bumiller
7ac29190db tc/lexer: let quotes actually start strings
The lexer will go with the longest match, so previously
the starting double quotes of a string would be swallowed by
the [^ \t\r\n()]+ pattern leaving the user no way to
actually use strings with escape sequences.
Fix this by not allowing this case to start with double
quotes.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-01-24 08:49:10 -08:00
Jiri Pirko
063463efd7 tc: implement ingress/egress block index attributes for qdiscs
During qdisc creation it is possible to specify shared block for bot
ingress and egress. Pass this values to kernel according to the command
line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:57 -08:00
Jiri Pirko
0c7cef9669 tc: introduce support for block-handle for filter operations
So far, qdisc was the only handle that could be used to manipulate
filters. Kernel added support for using block to manipulate it. So add
the support to use block index to manipulate filters. The magic
TCM_IFINDEX_MAGIC_BLOCK indicates the block index is in use.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:53 -08:00
Jiri Pirko
d0bcedd549 tc: introduce tc_qdisc_block_exists helper
This hepler used qdisc dump to list all qdisc and find if block index in
question is used by any of them. That means the block with specified
index exists.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:35 -08:00
David Ahern
8c75f69411 Merge branch 'master' into net-next
Conflicts:
	ip/link_gre.c
	ip/link_gre6.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:37:39 -08:00
Jakub Kicinski
e0850bdedc tc: red: allow setting th_min and th_max to the same value
Setting th_min and th_max to the same value may be useful for DCTCP
deployments.  The original DCTCP paper describes it as a simplest way
of achieving simple ECN threshold marking.  Indeed, there doesn't seem
to be any simpler qdisc in Linux which would allow such a setup today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-19 12:35:23 -08:00
Phil Sutter
6f7df6b2a1 tc: Optimize gact action lookup
When adding a filter with a gact action such as 'drop', tc first tries
to open a shared object with equivalent name (m_drop.so in this case)
before trying gact. Avoid this by matching the action name against those
handled by gact prior to calling get_action_kind().

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-01-17 10:27:47 -08:00
Chris Mi
485d0c6001 tc: Add batchsize feature for filter and actions
Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this support, at most 128
commands can be accumulated before sending to kernel.

Now it only works for the following successive commands:
1. filter add/delete/change/replace
2. actions add/change/replace

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-14 09:03:35 -08:00
Stephen Hemminger
7d63671030 tc: remove no longer relevant README
This document described how kernel and tc used to handle
timing. In last two years, kernel has switched over to using
ktime. Nothing to see here, move along.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:21:22 -08:00
Jamal Hadi Salim
24a5a48e27 tc: Fix filter protocol output
Fixes: 249284ff5a ("tc: jsonify filter core")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-01-09 08:09:10 -08:00
Yuval Mintz
b97c6fa71d qdisc: print offload indication
Use the newly added TCA_HW_OFFLOAD indication from kernel
to print a consistent 'offloaded' message to user when listing qdiscs.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-27 13:55:16 -08:00
Chris Mi
83cf5bc73b tc: fix command "tc actions del" hang issue
If command is RTM_DELACTION, a non-NULL pointer is passed to rtnl_talk().
Then flag NLM_F_ACK is not set on n->nlmsg_flags and netlink_ack() will
not be called. Command tc will wait for the reply for ever.

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-14 21:17:04 -08:00
Jiri Pirko
1876ab0779 tc: fix json array closing
Fixes: 2704bd6255 ("tc: jsonify actions core")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-12-13 18:16:27 -08:00
Michal Privoznik
3572e01a09 tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()
Not all callers want parse_action_control*() to advance the
arguments. For instance act_parse_police() does the argument
advancing itself.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-12-08 10:29:01 -08:00
Stephen Hemminger
c6a656f4f9 m_mirred: style cleanups
Fix whitespace and long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:42:17 -08:00
Stephen Hemminger
5c235ac27e m_gact: whitespace cleanup
Fix whitespace errors reported by checkpatch

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:38:21 -08:00
Stephen Hemminger
ed4856919f m_action: style cleanup
Break long lines, and use bool where possible.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:36:15 -08:00
Stephen Hemminger
eb4bccf12b m_vlan: style cleanups
Break long lines and make duplicated code into function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:28:55 -08:00
Jiri Pirko
b021ee40f6 tc: jsonify vlan action
Add json output to vlan action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
502c4adf19 tc: jsonify mirred action
Add json output to mirred action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
66fedb6df0 tc: jsonify gact action
Add json output to gact action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
2704bd6255 tc: jsonify actions core
Add json output to actions core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
619ca351e3 tc: jsonify matchall filter
Add json output to matchall filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
e28b88a464 tc: jsonify flower filter
Add json output to flower filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
249284ff5a tc: jsonify filter core
Add json output to filter core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
f354fa6aa5 tc: jsonify htb qdisc
Add json output to htb qdisc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
378ac491f5 tc: jsonify fq_codel qdisc
Add json output to fq_codel qdisc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
4fcec7f366 tc: jsonify stats2
Add json output to stats2.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
c91d262f41 tc: jsonify qdisc core
Add json output to qdisc core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko
81051c60c2 tc: remove action cookie len from printout
Make the output same as input and avoid printout of unnecessary len.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:18:38 -08:00
Jiri Pirko
abff45b802 tc: move action cookie print out of the stats if
Cookie print was made dependent on show_stats for no good reason. Fix
this bu pushing cookie print ot of the stats if.

Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:18:38 -08:00
Jakub Kicinski
eb91c55731 f_bpf: communicate ifindex for eBPF offload
Split parsing and loading of the eBPF program and if skip_sw is set
load the program for ifindex, to which the qdisc is attached.

Note that the ifindex will be ignored for programs which are already
loaded (e.g. when using pinned programs), but in that case we just
trust the user knows what he's doing.  Hopefully we will get extack
soon in the driver to help debugging this case.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski
01ea76b1cf tc_filter: resolve device name before parsing filter
Move resolving device name into an ifindex before calling filter
specific callbacks.  This way if filters need the ifindex, they
can read it from the request.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski
67c857df80 {f, m}_bpf: don't allow specifying multiple bpf programs
Both BPF filter and action will allow users to specify run
multiple times, and only the last one will be considered by
the kernel.  Explicitly refuse such command lines.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski
399db8392b bpf: rename bpf_parse_common() to bpf_parse_and_load_common()
bpf_parse_common() parses and loads the program.  Rename it
accordingly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski
658cfebc27 bpf: pass program type in struct bpf_cfg_in
Program type is needed both for parsing and loading of
the program.  Parsing may also induce the type based on
signatures from __bpf_prog_meta.  Instead of passing
the type around keep it in struct bpf_cfg_in.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Stephen Hemminger
6054c1ebf7 SPDX license identifiers
For all files in iproute2 which do not have an obvious license
identification, mark them with SPDK GPL-2

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 12:21:35 -08:00
Stephen Hemminger
859af0a5dc tc: break long lines
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 11:31:36 -08:00
Nishanth Devarajan
927e3cfb52 tc: B.W limits can now be specified in %.
This patch adapts the tc command line interface to allow bandwidth limits
to be specified as a percentage of the interface's capacity.

Adding this functionality requires passing the specified device string to
each class/qdisc which changes the prototype for a couple of functions: the
.parse_qopt and .parse_copt interfaces. The device string is a required
parameter for tc-qdisc and tc-class, and when not specified, the kernel
returns ENODEV. In this patch, if the user tries to specify a bandwidth
percentage without naming the device, we return an error from userspace.

Signed-off-by: Nishanth Devarajan<ndev2021@gmail.com>
2017-11-24 11:22:13 -08:00
Stephen Hemminger
b317557f58 tc: replace magic constant 16 with #define
For places where tc is expecting device name use IFNAMSIZ.
For others where it is a filter name, introduce a new constant.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 11:19:18 -08:00
Phil Sutter
66942e522e tc_util: Silence spurious compiler warning
GCC version 7.2.1 complains that 'result1' may be used uninitialized in
parse_action_control_slash_spaces(). This should not be possible in
practice, so the actual value 'result1' is initialized with does not
matter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-16 16:01:48 -08:00
Phil Sutter
b7c61286de tc_util: Drop needless pointer check
The function parse_action_control_slash() returns early if 'p' is NULL,
so after the first call to action_a2n(), 'p' is guaranteed not to be
NULL. Otherwise, the assignment '*p = 0' above would dereference the
NULL pointer already anyway, so just drop this check here.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-16 16:01:48 -08:00
Stephen Hemminger
913352fe54 drop unneeded include of syslog.h
Only arpd uses syslog

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 16:22:36 -08:00
Stephen Hemminger
d72ac5a17b Merge branch 'master' into net-next 2017-11-12 16:17:37 -08:00
Ivan Vecera
6648853975 lib: make resolve_hosts variable common
Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2017-11-12 16:15:23 -08:00
Roman Mashak
274b63ae21 tc: distinguish Add/Replace qdisc operations
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-11-12 15:57:08 -08:00
Stephen Hemminger
b158c1790f Merge branch 'master' into net-next 2017-11-09 09:45:17 +09:00
Stephen Hemminger
e4beb52787 netem: use fixed rather than floating point for scaling
Don't need to do floating point math to compute scaled random.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-07 11:15:34 +09:00
Amritha Nambiar
0d575c4dac flower: Represent HW traffic classes as classid values
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the classid reservation scheme for hardware
traffic classes and offloads to route packets to a hardware
traffic class are accepted in net-next.

HW traffic classes 0 through 15 are represented using the
reserved classid values :ffe0 - :ffef.

Example:
Match Dst IPv4,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1/32\
  ip_proto udp dst_port 12000 skip_sw\
  hw_tc 1

# tc filter show dev eth0 parent ffff:
filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x1 hw_tc 1
  eth_type ipv4
  ip_proto udp
  dst_ip 192.168.1.1
  dst_port 12000
  skip_sw
  in_hw

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-07 11:04:54 +09:00
Vinicius Costa Gomes
c9681ac1b3 tc: Add support for the CBS qdisc
The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).

The syntax is:

tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
   		hicredit <HICREDIT> sendslope <SENDSLOPE>
		idleslope <IDLESLOPE>

(The order is not important)

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-01 22:22:48 +01:00
Amritha Nambiar
e1ac5b06f2 tc/mqprio: Offload mode and shaper options in mqprio
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the tc/mqprio changes are accepted in net-next.

Adds new mqprio options for 'mode' and 'shaper'. The mode
option can take values for offload modes such as 'dcb' (default),
'channel' with the 'hw' option set to 1. The new 'channel' mode
supports offloading TCs and other queue configurations. The
'shaper' option is to support HW shapers ('dcb' default) and
takes the value 'bw_rlimit' for bandwidth rate limiting. The
parameters to the bw_rlimit shaper are minimum and maximum
bandwidth rates. New HW shapers in future can be supported
through the shaper attribute.

# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
  min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit

# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             mode:channel
             shaper:bw_rlimit   min_rate:1Gbit 2Gbit   max_rate:4Gbit 5Gbit

v2: Avoid buffer overrun and minor cleanup.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-01 22:20:06 +01:00
Stephen Hemminger
c1606c44b3 Merge branch 'master' into net-next 2017-10-31 18:03:12 +01:00
Alexander Aring
25a24934ab tc: m_ife: fix match tcindex parsing
This patch changes ife_prio to ife_tcindex which is right variable to
assign in the argument in this case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-10-31 17:56:58 +01:00
Hangbin Liu
86bf43c7c2 lib/libnetlink: update rtnl_talk to support malloc buff at run time
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.

With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.

With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.

We need to free answer after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
Stephen Hemminger
2ac0c6c2c1 Merge branch 'master' into net-next 2017-10-25 12:39:18 +02:00
Jamal Hadi Salim
35f2a7639d tc/actions: introduce support for jump action
Sample use case:

... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress

 ... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe

sudo $TC -s actions ls action police
 action order 0:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
 ref 1 bind 0 installed 41 sec used 41 sec
 Action statistics:
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12

... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12

Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8

... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw  (rule hit 20000 success 10000)
  match 7f000008/ffffffff at 16 (success 10000 )
	action order 1:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
	ref 2 bind 1 installed 198 sec used 2 sec
	Action statistics:
	Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
	backlog 0b 0p requeues 0

	action order 2:  skbedit mark 11 pass
	 index 11 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 3:  skbedit mark 12 pass
	 index 12 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-25 12:33:46 +02:00
Stephen Hemminger
4c6080b5c4 Merge branch 'master' into net-next 2017-10-12 09:06:10 -07:00
Stephen Hemminger
268a9eee98 netem: fix code indentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 18:08:15 -07:00
Stephen Hemminger
60509b997d Merge branch 'master' into net-next 2017-10-02 08:04:13 -07:00
Phil Sutter
625df645b7 Check user supplied interface name lengths
The original problem was that something like:

| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);

might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.

Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Phil Sutter
ee474849c8 tc: flower: No need to cache indev arg
Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Yulia Kartseva
73451259da tc: fix ipv6 filter selector attribute for some prefix lengths
Wrong TCA_U32_SEL attribute packing if prefixLen AND 0x1f equals 0x1f.
These are  /31, /63, /95 and /127 prefix lengths.

Example:
ip6 dst face:b00f::/31
filter parent b: protocol ipv6 pref 2307 u32
filter parent b: protocol ipv6 pref 2307 u32 fh 800: ht divisor 1
filter parent b: protocol ipv6 pref 2307 u32 fh 800::800 order 2048
key ht 800 bkt 0
  match faceb00f/ffffffff at 24

v2: previous patch was made with a wrong repo

Signed-off-by: Yulia Kartseva <hex@fb.com>
2017-10-01 13:41:29 -07:00
Stephen Hemminger
58677cc2d3 tc: flower remove unused variable
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-20 18:08:43 -07:00
Benjamin LaHaise
7638ee13c1 tc: flower: support for matching MPLS labels
This patch adds support to the iproute2 tc filter command for matching MPLS
labels in the flower classifier.  The ability to match the Time To Live,
Bottom Of Stack, Traffic Control and Label fields are added as options to
the flower filter.

e.g.:
  tc filter add dev eth0 protocol 0x8847 parent ffff: \
    flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \
    action drop

Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2017-09-20 18:07:21 -07:00
Eric Dumazet
ff28b7519d tc: fq: support low_rate_threshold attribute
TCA_FQ_LOW_RATE_THRESHOLD sch_fq attribute was added in linux-4.9

Tested:

lpaa5:/tmp# tc -qd add dev eth1 root fq
lpaa5:/tmp# tc -s qd sh dev eth1
qdisc fq 8003: root refcnt 5 limit 10000p flow_limit 1000p buckets 4096 \
 orphan_mask 4095 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3648 \
 initial_quantum 18240 low_rate_threshold 550Kbit refill_delay 40.0ms
 Sent 62139 bytes 395 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  116 flows (114 inactive, 0 throttled)
  1 gc, 0 highprio, 0 throttled

lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
99th Percentile Latency Microseconds
7081

lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
99th Percentile Latency Microseconds
858

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
2017-09-12 21:33:31 -07:00
Stephen Hemminger
a17a01145f Merge branch 'master' into net-next 2017-09-05 09:33:29 -07:00
Daniel Borkmann
a0b5b7cf5c bpf: consolidate dumps to use bpf_dump_prog_info
Consolidate dump of prog info to use bpf_dump_prog_info() when possible.
Moving forward, we want to have a consistent output for BPF progs when
being dumped. E.g. in cls/act case we used to dump tag as a separate
netlink attribute before we had BPF_OBJ_GET_INFO_BY_FD bpf(2) command.

Move dumping tag into bpf_dump_prog_info() as well, and only dump the
netlink attribute for older kernels. Also, reuse bpf_dump_prog_info()
for XDP case, so we can dump tag and whether program was jited, which
we currently don't show.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-05 09:26:34 -07:00
Simon Horman
b75e0f6f4b tc actions: store and dump correct length of user cookies
Correct two errors which cancel each other out:
* Do not send twice the length of the actual provided by the user to the kernel
* Do not dump half the length of the cookie provided by the kernel

As the cookie is now stored in the kernel at its correct length rather
than double the that length cookies of up to the maximum size of 16 bytes
may now be stored rather than a maximum of half that length.

Output of dump is the same before and after this change,
but the data stored in the kernel is now exactly the cookie
rather than the cookie + as many trailing zeros.

Before:
 # tc filter add dev eth0 protocol ip parent ffff: \
       flower ip_proto udp action drop \
       cookie 0123456789abcdef0123456789abcdef
 RTNETLINK answers: Invalid argument

After:
 # tc filter add dev eth0 protocol ip parent ffff: \
       flower ip_proto udp action drop \
       cookie 0123456789abcdef0123456789abcdef
 # tc filter show dev eth0 ingress
   eth_type ipv4
   ip_proto udp
   not_in_hw
	 action order 1: gact action drop
	  random type none pass val 0
	  index 1 ref 1 bind 1 installed 1 sec used 1 sec
	 Action statistics:
	 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	 backlog 0b 0p requeues 0
	 cookie len 16 0123456789abcdef0123456789abcdef

Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-09-05 09:25:46 -07:00
Stephen Hemminger
2e706e12d9 Merge branch 'master' into net-next
Needed to add JSON support to tclass.
2017-09-01 12:17:48 -07:00
Phil Sutter
9376314b49 tc_util: No need to terminate an snprintf'ed buffer
snprintf() won't leave the buffer unterminated, so manually terminating
is not necessary here.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter
18f156bfec Convert the obvious cases to strlcpy()
This converts the typical idiom of manually terminating the buffer after
a call to strncpy().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Alexander Aring
38060de1eb tc: m_ife: report about kernels default type
This patch will report about if the ethertype for IFE is not specified
that the default IFE type is used.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring
664f35aa7c tc: m_ife: print IEEE ethertype format
This patch uses the usually IEEE format to display an ethertype which is
4-digits and every digit in upper case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring
bf338b60d4 tc: m_ife: allow ife type to zero
This patch allows to set an ethertype for IFE which is zero. There is no
kernel side validation which forbids a type to zero.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-30 08:26:46 -07:00
Stephen Hemminger
f474588028 Merge branch 'master' into net-next 2017-08-24 15:30:32 -07:00
Stephen Hemminger
c4fc474b88 tc: use named initializer for default mqprio options
Use C99 initializer

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:28:15 -07:00
Phil Sutter
56270e5466 tc/m_xt: Fix for potential string buffer overflows
- Use strncpy() when writing to target->t->u.user.name and make sure the
  final byte remains untouched (xtables_calloc() set it to zero).
- 'tname' length sanitization was completely wrong: If it's length
  exceeded the 16 bytes available in 'k', passing a length value of 16
  to strncpy() would overwrite the previously NULL'ed 'k[15]'. Also, the
  sanitization has to happen if 'tname' is exactly 16 bytes long as
  well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter
75716932a0 tc/tc_filter: Make sure filter name is not empty
The later check for 'k[0] != 0' requires a non-empty filter name,
otherwise NULL pointer dereference in 'q' might happen.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Phil Sutter
a754de3ccd tc/q_netem: Don't dereference possibly NULL pointer
Assuming 'opt' might be NULL, move the call to RTA_PAYLOAD to after the
check since it dereferences its parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Stephen Hemminger
5f1df307b4 config: put CFLAGS/LDLIBS in config.mk
This renames Config to config.mk and includes more Make input.
Now configure generates all the required CFLAGS and LDLIBS for
the optional libraries.

Also, use pkg-config to test for libelf, rather than using a test
program. This makes it consistent with other libraries.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-23 10:03:09 -07:00
Stephen Hemminger
51186362ba Merge branch 'master' into net-next 2017-08-21 17:37:15 -07:00
Phil Sutter
82ed9ffa2b tc/q_multiq: Don't pass garbage in TCA_OPTIONS
multiq_parse_opt() doesn't change 'opt' at all. So at least make sure
it doesn't fill TCA_OPTIONS attribute with garbage from stack.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Stephen Hemminger
a4b8e88d87 Merge branch 'master' into net-next 2017-08-21 17:14:19 -07:00
Phil Sutter
73aa988868 tc/m_gact: Drop dead code
The use of 'ok' variable in parse_gact() is ineffective: The second
conditional increments it either if *argv is 'gact' or if
parse_action_control() doesn't fail (in which case exit() is called).
So this is effectively an unconditional increment and since no decrement
happens anywhere, all remaining checks for 'ok != 0' can be dropped.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Stephen Hemminger
fa93d9a8aa Merge branch 'master' into net-next 2017-08-18 09:43:00 -07:00
Phil Sutter
3e587d9f43 tc/em_ipset: Don't leak sockfd on error path
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:16:59 -07:00
Stephen Hemminger
16ab6c47ba Merge branch 'master' into net-next 2017-08-10 16:41:59 -07:00
Daniel Borkmann
8cc360fe48 bpf: unbreak libelf linkage for bpf obj loader
Commit 69fed534a5 ("change how Config is used in Makefile's") moved
HAVE_MNL specific CFLAGS/LDLIBS for building with libmnl out of the
top level Makefile into sub-Makefiles. However, it also removed the
HAVE_ELF specific CFLAGS/LDLIBS entirely, which breaks the BPF object
loader for tc and ip with "No ELF library support compiled in." despite
having libelf detected in configure script. Fix it similarly as in
69fed534a5 for HAVE_ELF.

Fixes: 69fed534a5 ("change how Config is used in Makefile's")
Reported-by: Jeffrey Panneman <jeffrey.panneman@tno.nl>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-10 16:40:02 -07:00
Stephen Hemminger
e9155685b7 Merge branch 'master' into net-next 2017-08-09 08:41:34 -07:00
Stephen Hemminger
6ff66acc60 tc, ip: more Makefile updates for LIBMNL
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:38:51 -07:00
Jamal Hadi Salim
9e71352581 tc actions: Improved batching and time filtered dumping
dump more than TCA_ACT_MAX_PRIO actions per batch when the kernel
supports it.

Introduced keyword "since" for time based filtering of actions.
Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400

go get some coffee and wait for > 120 seconds and try again:

prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0

Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
  match 7f000002/ffffffff at 12 (success 1 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1145 sec used 802 sec
    Action statistics:
    Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
...

that coffee took long, no? It was good.

Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1

More details please:
prompt$ hackedtc -s actions ls action gact since 120000

    action order 0: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1270 sec used 30 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

And the filter?
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
  match 7f000002/ffffffff at 12 (success 2 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1324 sec used 84 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-04 13:16:51 -07:00
Stephen Hemminger
620fc6696d tc: fix m_simple usage
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-03 16:10:18 -07:00
Phil Sutter
e2a055dd23 tc-simple: Fix documentation
- CONTROL has to come last, otherwise 'index' applies to gact and not
  simple itself.
- Man page wasn't updated to reflect syntax changes.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-03 16:02:44 -07:00
Daniel Borkmann
779525cd77 bpf: dump id/jited info for cls/act programs
Make use of TCA_BPF_ID/TCA_ACT_BPF_ID that we exposed and print the ID
of the programs loaded and use the new BPF_OBJ_GET_INFO_BY_FD command
for dumping further information about the program, currently whether
the attached program is jited.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-18 17:20:45 -07:00
Stephen Hemminger
1fd8a8e23d Merge branch 'master' into net-next 2017-06-27 16:10:55 -07:00
Roman Mashak
fb12cea8d9 tc: fixed typo in usage text.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-06-21 08:34:28 -07:00
Jiri Benc
59eb271d1d tc: m_tunnel_key: add csum/nocsum option
Allows control of UDP zero checksum.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:11:42 -07:00
Jiri Benc
50907a8245 tc: m_tunnel_key: reformat the usage text
Adding new tunnel key fields would cause the usage line overflow 80 chars.
Make the usage text similar to other commands.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:11:42 -07:00
Jiri Pirko
c794b7b179 tc: don't print error message on miss when parsing action with default
In case default control action parsing takes place, it is ok to miss.
So don't print error message.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Reported-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:07:31 -07:00
Jiri Pirko
d5ebd6fdde tc: add support for TRAP action
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 11:03:12 -07:00
Jiri Pirko
18f05d0601 tc: gact: fix control action parsing
parse_action_control helper does advancing of the arg inside. So don't
do it outside.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 11:03:12 -07:00
Or Gerlitz
6ea2c2b1cf tc: flower: add support for matching on ip tos and ttl
Allow users to set flower classifier filter rules which
include matches for ip tos and ttl.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 10:59:53 -07:00
Jiri Pirko
0c30d14d0a tc: flower: add support for tcp flags
Allow user to insert a flower classifier filter rule which includes
match for tcp flags.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-30 17:41:32 -07:00
Stephen Hemminger
2ecb169280 Merge branch 'master' into net-next 2017-05-30 17:40:57 -07:00
Phil Sutter
f6fc1055e4 tc: m_xt: Prevent a segfault in libipt
This happens with NAT targets, such as SNAT, DNAT and MASQUERADE. These
are still not usable with this patch, but at least tc doesn't crash
anymore when one tries to use them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-05-30 17:38:19 -07:00
Roman Mashak
cba134ae70 tc: fix Makefile to build skbmod
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-05-22 13:33:51 -07:00
Jiri Pirko
d19f72f789 tc/actions: introduce support for goto chain action
Allow user to set control action "goto" with filter chain index as
a parameter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Jiri Pirko
e67aba5595 tc: actions: add helpers to parse and print control actions
Each tc action is terminated by a control action. Each action parses and
prints then intividually. Introduce set of helpers and allow to share
this code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Jiri Pirko
732f03461b tc_filter: add support for chain index
Allow user to put filter to a specific chain identified by index.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Khem Raj
ae717baf15 tc: include stdint.h explicitly for UINT16_MAX
Fixes
| tc_core.c:190:29: error: 'UINT16_MAX' undeclared (first use in this function); did you mean '__INT16_MAX__'?
|    if ((sz >> s->size_log) > UINT16_MAX) {
|                              ^~~~~~~~~~

Signed-off-by: Khem Raj <raj.khem@gmail.com>
2017-05-22 11:41:53 -07:00
Amir Vadai
f3e1b2448a pedit: Introduce ipv6 support
Add support for modifying IPv6 headers using pedit.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai
a13426fe1a pedit: Check for extended capability in protocol parser
Do not allow using eth and udp header types if non-extended pedit kABI
is being used. Other protocol parsers already have this check.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai
cdca191862 pedit: Do not allow using retain for too big fields
Using retain for fields longer than 32 bits is not supported.
Do not allow user to do it.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai
290cdc058d pedit: Fix a typo in warning
'ex' attribute should be placed after 'action pedit' and not after
'munge'.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Or Gerlitz
e57285b81a tc: Reflect HW offload status
Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-05 09:49:25 -07:00
Stephen Hemminger
d2b9100a08 Merge branch 'master' into net-next 2017-05-01 09:26:51 -07:00
Stephen Hemminger
1e600da057 pedit: fix whitespace
Add newlines to break long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:25:22 -07:00
Or Gerlitz
3d2a7781ec tc/pedit: p_udp: introduce pedit udp support
For example, forward udp traffic destined to port 999 to veth0 and set
tcp port to 888:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 999 \
    action pedit ex munge \
      udp dport set 888 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
2c6eb12ab8 tc/pedit: p_tcp: introduce pedit tcp support
For example, forward tcp traffic destined to port 80 to veth0 and set
tcp port to 8080:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
      dst_port 80 \
    action pedit ex munge \
      tcp dport set 8080 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
3cd5149ecd tc/pedit: p_eth: ETH header editor
For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      eth dst set 11:22:33:44:55:66 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
fa4652ff3b tc/pedit: Support fields bigger than 32 bits
Make parse_val() accept fields up to 128 bits long, this should be
enough for current use cases and involves a minimal change to code.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
8d193d9607 tc/pedit: p_ip: introduce editing ttl header
Enable user to edit IP header ttl field.

For example, to forward any TCP packet and decrease its TTL by one:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      ip ttl add 0xff pipe \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
c05ddaf9e0 tc/pedit: Introduce 'add' operation
This command could be useful to increase/decrease fields value.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
7c71a40cbd tc/pedit: Extend pedit to specify offset relative to mac/transport headers
Utilize the extended pedit netlink to set an offset relative to a
specific header type. Old netlink only enabled the user to set
approximated  offset relative to the IPv4 header.

To use this extended functionality need to use the 'ex' keyword after
'pedit' and before any 'munge'.
e.g:
$ tc filter add dev ens9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 80 \
    action pedit ex munge \
      ip dst set 1.1.1.1 \
      pipe \
    action mirred egress redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai
51536ebbe8 tc/pedit: Fix a typo in pedit usage message
Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Stephen Hemminger
590dde3a98 Merge branch 'master' into net-next 2017-04-23 09:14:35 -07:00
Jamal Hadi Salim
fd8b3d2c1b actions: Add support for user cookies
Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 1 bind 0 installed 5 sec used 5 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-04-23 09:10:02 -07:00
Stephen Hemminger
f4878dfae4 Merge branch 'master' into net-next 2017-04-04 14:56:41 -07:00
Roman Mashak
878babffec tc: print skbedit action when dumping actions.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-04-04 14:48:54 -07:00
Jiri Kosina
7c581a124d iproute2: add support for invisible qdisc dumping
Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking
kernel to perform 'full qdisc dump', as for historical reasons some of the
default qdiscs are being hidden by the kernel.

The command syntax is being extended by voluntary 'invisible' argument to
'tc qdisc show'.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-14 16:37:08 -07:00
Stephen Hemminger
60ccfcd7f2 pie: remove always false condition
When built with GCC warnings enabled:
q_pie.c: In function ‘pie_parse_opt’:
q_pie.c:78:38: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
        (alpha > ALPHA_MAX) || (alpha < ALPHA_MIN)) {
                                      ^
q_pie.c:85:35: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
        (beta > BETA_MAX) || (beta < BETA_MIN)) {
                                   ^

This is because MIN is 0 and unsigned number can never be less than 0.
Therefore just remove the _MIN values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-10 08:58:01 -08:00
Stephen Hemminger
a59b616200 tc: use rta_getattr_u32
Don't cast RTA_DATA use newish accessors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 15:24:34 -08:00
Jiri Kosina
be67f81297 iproute2: tc: introduce build dependency on libnetlink
Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong
(especially so for builds where libnetlink.a gets statically linked into
tc). Fix that by introducing an explicit dependency.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-24 15:11:32 -08:00
Stephen Hemminger
9f1370c0e5 netlink route attribute cleanup
Use the new helper functions rta_getattr_u* instead of direct
cast of RTA_DATA().  Where RTA_DATA() is a structure, then remove
the unnecessary cast since RTA_DATA() is void *

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 08:56:38 -08:00
Daniel Borkmann
e37d706b56 {f,m}_bpf: dump tag over insns
We already export TCA_BPF_TAG resp. TCA_ACT_BPF_TAG from kernel commit
f1f7714ea51c ("bpf: rework prog_digest into prog_tag"), thus also dump
it when filter/actions are shown.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-02-23 09:02:19 -08:00
Roi Dayan
164a9ff401 tc: flower: Fix parsing ip address
Fix order of arguments when passed to __flower_parse_ip_addr.

Fixes: ("f888f4e20534 tc: flower: Support matching ARP")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-02-23 09:01:15 -08:00
Stephen Hemminger
732b18af97 Merge branch 'merge-4.10' into next-merge 2017-02-17 15:32:28 -08:00
Simon Horman
6374961a00 tc: flower: support masked ICMP code and type match
Extend ICMP code and type match to support masks.

Also add missing documentation to synopsis in manpage.

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
	indev eth0 ip_proto icmpv6 type 128/240 code 0 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Simon Horman
9d36e54f36 tc: flower: provide generic masked u8 print helper
Provide generic masked u8 print helper and use it to print arp operations.

Also:
* Make name parameter of arp op print helper const.
* Consistently use __u8 rather than uint8_t, in keeping with the
  pervasive style in the file.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Simon Horman
180136e540 tc: flower: provide generic masked u8 parser helper
Provide generic masked u8 paser helper and use it to parse arp operations.

Also consistently use __u8 rather than uint8_t, in keeping with the
pervasive style in the file.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Or Gerlitz
afdc1fed24 tc: matchall: Print skip flags when dumping a filter
Print the skip flags when we dump a filter.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:25:24 -08:00
Simon Horman
c7ec052bb8 tc: flower: Update documentation to indicate ARP takes IPv4 prefixes
Unlike other PREFIXes documented in the usage for tc flower, which accept
both IPv4 and IPv6 prefixes, arp_sip and arp_tip only accepts IPv4
prefixes.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-08 11:39:33 -08:00
Simon Horman
81f6e5a727 tc: flower: use correct type when calling flower_icmp_attr_type
Use enum flower_icmp_field rather than bool as type of third parameter
when calling flower_icmp_attr_type.

Fixes: eb3b5696f1 ("tc: flower: support matching on ICMP type and code")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-08 11:37:44 -08:00
Yotam Gigi
0b1abd84fb tc: Add support for the sample tc action
The sample tc action allows sampling packets matching a classifier. It
peeks randomly packets, and samples them using the psample netlink
channel. The user can specify the psample group, which the packet will be
sampled to, the sampling rate and the packet truncation (to save
kernel-user traffic).

The sampled packets contain informative metadata, for example, the input
interface and the original packet length.

The action syntax:
tc filter add [...] \
	action sample rate <RATE> group <GROUP> [trunc <SIZE>]
	[...]

Where:
  RATE := The sampling rate which is the ratio of packets observed at the
	  data source to the samples generated
  GROUP := the psample module sampling group
  SIZE := optional truncation size

An example for a common usecase of the sample tc action: to sample ingress
traffic from interface eth1, one may use the commands:

tc qdisc add dev eth1 handle ffff: ingress

tc filter add dev eth1 parent ffff: \
       matchall action sample rate 12 group 4

Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets
on dev eth1 to psample group 4.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-06 14:24:52 -08:00
Stephen Hemminger
fefc93bb28 Merge branch 'master' into net-next 2017-01-29 20:30:05 -08:00
Roman Mashak
31951c47e9 tc: distinguish Add/Replace action operations.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2017-01-29 20:26:44 -08:00
Benjamin LaHaise
4f7d406f5d f_flower: don't set TCA_FLOWER_KEY_ETH_TYPE for "protocol all"
v2 - update to address changes in 00697ca19a.

When using the tc flower filter, rules marked with "protocol all" do not
actually match all packets.  This is due to a bug in f_flower.c that passes
in ETH_P_ALL in the TCA_FLOWER_KEY_ETH_TYPE attribute when adding a rule.
Fix this by omitting TCA_FLOWER_KEY_ETH_TYPE if the protocol is set to
ETH_P_ALL.

Fixes: 488b41d020 ("tc: flower no need to specify the ethertype")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2017-01-29 20:23:58 -08:00
Paul Blakey
08f66c80c0 tc: flower: Refactor matching flags to be more user friendly
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no"  (e.g nofrag) unsets the flag,
otherwise it wil be set.

Example:
    # add a flower filter that will drop fragmented packets
    tc filter add dev ens4f0 protocol ip parent ffff: \
            flower \
            src_mac e4:1d:2d:fd:8b:01 \
            dst_mac e4:1d:2d:fd:8b:02 \
            indev ens4f0 \
            ip_flags frag \
    action drop

    # add a flower filter that will drop non-fragmented packets
    tc filter add dev ens4f0 protocol ip parent ffff: \
            flower \
            src_mac e4:1d:2d:fd:8b:01 \
            dst_mac e4:1d:2d:fd:8b:02 \
            indev ens4f0 \
            ip_flags nofrag \
    action drop

Fixes: 22a8f01989 ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 10:36:45 -08:00
Davide Caratti
6561cb28f2 tc: m_csum: add support for SCTP checksum
'sctp' parameter can now be used as 'csum' target to enable CRC32c
computation on SCTP packets.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2017-01-20 09:32:08 -08:00
Stephen Hemminger
9174b4cf3e Merge branch 'master' into net-next 2017-01-20 09:27:57 -08:00
Roi Dayan
00697ca19a tc: flower: Fix incorrect error msg about eth type
addattr16 may return an error about the nl msg size
but not about incorrect eth type.

Fixes: 488b41d020 ("tc: flower no need to specify the ethertype")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
2017-01-20 09:27:34 -08:00
Roi Dayan
c85609b25f tc: flower: Add missing err check when parsing flower options
addattr32 may return an error.

Fixes: cfcabf18d8 ("tc: flower: Add skip_{hw|sw} support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
2017-01-20 09:27:34 -08:00
Roi Dayan
b2141de1ad tc: flower: Fix flower output for src and dst ports
This fix a missing use case after the introduction of enum flower_endpoint.

Fixes: 6910d65661 ("tc: flower: introduce enum flower_endpoint")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
2017-01-17 08:45:22 -08:00
Phil Sutter
a05b9557f4 tc: m_xt: Drop needless parentheses from #if checks
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-01-13 16:33:54 -08:00
Simon Horman
f888f4e205 tc: flower: Support matching ARP
Support matching on ARP operation, and hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \                    arp_op request arp_sip 10.0.0.1 action drop
tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \                   arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-01-12 17:46:37 -08:00
Stephen Hemminger
51dd3455a3 Merge branch 'master' into net-next 2017-01-12 17:44:44 -08:00