Warning was:
q_pie.c:202:22: error: implicit conversion from 'unsigned long' to
'double'
Fixes: 492ec9558b ("tc: pie: change maximum integer value of tc_pie_xstats->prob")
Cc: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
PIE now uses per packet timestamps to calculate queuing
delay. The average dequeue rate based queue delay
calculation is now made optional. This patch adds the option
to enable or disable the use of Little's law to calculate
queuing delay.
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add functions for big endian masked numbers as a pre-step towards masked
port numbers.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add u16 big endian parse option as a pre-step towards TCP/UDP/SCTP
ports usage.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Fix the output for ip tos and ttl to be numbers in JSON format.
Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
ip_tos 5/0xf action drop
Non JSON format remains the same:
$ tc filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_tos 5/0xf
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1
JSON format is changed (partial output):
$ tc -p -j filter show dev eth0 parent ffff:
Before:
"options": {
"keys": {
"ip_tos": "0x5/f",
...
After:
"options": {
"keys": {
"ip_tos": 5,
"ip_tos_mask": 15,
...
Fixes: 6ea2c2b1cf ("tc: flower: add support for matching on ip tos and ttl")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add an option to print masked numbers with or without a newline, as a
pre-step towards using a common function.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Introduce a function to print masked number with a different output for
JSON or non-JSON methods, as a pre-step towards printing numbers using
this common function.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Implement setting and printing of action flags with single available flag
value "no_percpu" that translates to kernel UAPI TCA_ACT_FLAGS value
TCA_ACT_FLAGS_NO_PERCPU_STATS. Update man page with information regarding
usage of action flags.
Example usage:
# tc actions add action gact drop no_percpu
# sudo tc actions list action gact
total acts 1
action order 0: gact action drop
random type none pass val 0
index 1 ref 1 bind 0
no_percpu
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Function parse_ct() manually calls NEXT_ARG_FWD() after
parse_action_control_dflt(). This is redundant because
parse_action_control_dflt() modifies argc and argv itself. Moreover, such
implementation parses out any following actions option. For example, adding
action ct with cookie errors:
$ sudo tc actions add action ct cookie 111111111111
Bad action type 111111111111
Usage: ... gact <ACTION> [RAND] [INDEX]
Where: ACTION := reclassify | drop | continue | pass | pipe |
goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
RAND := random <RANDTYPE> <ACTION> <VAL>
RANDTYPE := netrand | determ
VAL : = value not exceeding 10000
JUMP_COUNT := Absolute jump from start of action list
INDEX := index value used
With fix:
$ sudo tc actions add action ct cookie 111111111111
$ sudo tc actions list action ct
total acts 1
action order 0: ct zone 0 pipe
index 1 ref 1 bind 0
cookie 111111111111
Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
tc segfaults if gact action is used without action or index:
$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Segmentation fault
We expect tc to fail gracefully with an error message.
This happens if gact is the last argument of the incomplete
command. In this case the "gact" action is parsed, the macro
NEXT_ARG_FWD() is executed and the next matches() crashes
because of null argv pointer.
To avoid this, simply use NEXT_ARG() instead.
With this change in place:
$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Command line is not complete. Try option "help"
Fixes: fa49588973 ("tc: Fix binding of gact action by index.")
Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Taking advantage of the 2 new
attributes TCA_POLICE_RATE64 and TCA_POLICE_PEAKRATE64 from kernel,
tc can use them to break the 32bit limit, and still keep the backward
binary compatibility.
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The revert of batchsize accidently reverted more than it should
and broke shared block functionality. Fix this by restoring the
original functionality.
To reproduce:
dst_ip 192.0.2.0/24 action drop
Unknown filter "block", hence option "10" is unparsable
Fixes: e991c04d64 ("Revert "tc: Add batchsize feature for filter and actions"")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This adds support for setting the txtime_delay parameter which is useful
for the txtime offload mode of taprio.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This allows a new parameter, flags, to be passed to taprio. Currently, it
only supports enabling the txtime-assist mode. But, we plan to add
different modes for taprio (e.g. hardware offloading) and this parameter
will be useful in enabling those modes.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
ETF Qdisc currently checks for a socket with SO_TXTIME socket option. If
either is not present, the packet is dropped. In the future commits, we
want other Qdiscs to add packet with launchtime to the ETF Qdisc. Also,
there are some packets (e.g. ICMP packets) which may not have a socket
associated with them. So, add an option to skip this check.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
New tc action to send packets to conntrack module, commit
them, and set a zone, labels, mark, and nat on the connection.
It can also clear the packet's conntrack state by using clear.
Usage:
ct clear
ct commit [force] [zone] [mark] [label] [nat]
ct [nat] [zone]
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Strict netlink validation now requires this flag on all nested
attributes, add it for action options.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
parse_percent() currently allows to specify negative percentages
or value above 100%. However this does not seems to make sense,
as the function is used for probabilities or bandiwidth rates.
Moreover, using negative values leads to erroneous results
(using Bernoulli loss model as example):
$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
$ tc qdisc show dev test
qdisc netem 800c: root refcnt 2 limit 10 loss gemodel p 90% r 10% 1-h 100% 1-k 0%
Using values above 100% we have instead:
$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel 140% limit 10
$ tc qdisc show dev test
qdisc netem 800f: root refcnt 2 limit 10 loss gemodel p 40% r 60% 1-h 100% 1-k 0%
This commit changes parse_percent() with a check to ensure
percentage values stay between 1.0 and 0.0.
parse_percent_rate() function, which already employs a similar
check, is adjusted accordingly.
With this check in place, we have:
$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
Illegal "loss gemodel p"
Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Many tc modules were printing error messages to stdout.
This is problematic if using JSON or other output formats.
Change all these places to use fprintf(stderr, ...) instead.
Also, remove unnecessary initialization and places
where else is used after error return.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Create a new action type for TC that allows the pushing, popping, and
modifying of MPLS headers.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add 32-bit missing mask attribute in iproute2/tc, which has been long
supported by the kernel side.
v2: print value in hex with print_hex() as suggested by Stephen Hemminger.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
As the man page for tc netem states:
To use the Bernoulli model, the only needed parameter is p while the
others will be set to the default values r=1-p, 1-h=1 and 1-k=0.
However r parameter is erroneusly set to 1, and not to 1-p.
Fix this using the same approach of the 4-state loss model.
Fixes: 3c7950af59 ("netem: add support for 4 state and GE loss model")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add JSON output support to q_netem.
The normal output is untouched.
In JSON output always use seconds as the base of time units,
and non-percentage numbers (0.01 instead of 1%). Try to always
report the fields, even if they are zero.
All this should make the output more machine-friendly.
v2: less macroes
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add a new parameter '-Numeric' to show the number of protocol, scope,
dsfield, etc directly instead of converting it to human readable name.
Do the same on tc and ss.
This patch is based on David Ahern's previous patch.
Suggested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
ctinfo is a tc action restoring data stored in conntrack marks to
various fields. At present it has two independent modes of operation,
restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
marks into packet skb marks.
It understands a number of parameters specific to this action in
additional to the usual action syntax. Each operating mode is
independent of the other so all options are optional, however not
specifying at least one mode is a bit pointless.
Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
[CONTROL] [index <INDEX>]
DSCP mode
dscp enables copying of a DSCP stored in the conntrack mark into the
ipv4/v6 diffserv field. The mask is a 32bit field and specifies where
in the conntrack mark the DSCP value is located. It must be 6
contiguous bits long. eg. 0xfc000000 would restore the DSCP from the
upper 6 bits of the conntrack mark.
The DSCP copying may be optionally controlled by a statemask. The
statemask is a 32bit field, usually with a single bit set and must not
overlap the dscp mask. The DSCP restore operation will only take place
if the corresponding bit/s in conntrack mark ANDed with the statemask
yield a non zero result.
eg. dscp 0xfc000000 0x01000000 would retrieve the DSCP from the top 6
bits, whilst using bit 25 as a flag to do so. Bit 26 is unused in this
example.
CPMARK mode
cpmark enables copying of the conntrack mark to the packet skb mark. In
this mode it is completely equivalent to the existing act_connmark
action. Additional functionality is provided by the optional mask
parameter, whereby the stored conntrack mark is logically ANDed with the
cpmark mask before being stored into skb mark. This allows shared usage
of the conntrack mark between applications.
eg. cpmark 0x00ffffff would restore only the lower 24 bits of the
conntrack mark, thus may be useful in the event that the upper 8 bits
are used by the DSCP function.
Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
[CONTROL] [index <INDEX>]
where :
dscp MASK is the bitmask to restore DSCP
STATEMASK is the bitmask to determine conditional restoring
cpmark MASK mask applied to restored packet mark
ZONE is the conntrack zone
CONTROL := reclassify | pipe | drop | continue | ok |
goto chain <CHAIN_INDEX>
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
the following TDC test case:
b776 - Replace simple action with invalid goto chain control
checks if the kernel correctly validates the 'goto chain' control action,
when it is specified in 'act_simple' rules. The test systematically fails
because the control action is hardcoded in parse_simple(), i.e. it is not
parsed by command line arguments, so its value is constantly TC_ACT_PIPE.
Because of that, the following command:
# tc action add action simple sdata "test" drop index 7
installs an 'act_simple' rule that never drops packets, and whose 'index'
is the first IDR available, plus an 'act_gact' rule with 'index' equal to
7, that drops packets.
Use parse_action_control_dflt(), like we did on many other TC actions, to
make the control action configurable also with 'act_simple'. The expected
results of test b776 are summarized below:
iproute2
v kernel->| 5.1-rc2 (and previous) | 5.1-rc3 (and subsequent)
------------------+-------------------------+-------------------------
5.1.0 | FAIL (bad IDR) | FAIL (bad IDR)
5.1.0(patched) | FAIL (no rule/bad sdata)| PASS
Changes since v1:
- reword commit message, thanks Stephen Hemminger
Fixes: 087f46ee4e ("tc: introduce simple action")
CC: Andrea Claudi <aclaudi@redhat.com>
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The following operation fails:
% sudo tc actions add action pipe index 1
% sudo tc filter add dev lo parent ffff: \
protocol ip pref 10 u32 match ip src 127.0.0.2 \
flowid 1:10 action gact index 1
Bad action type index
Usage: ... gact <ACTION> [RAND] [INDEX]
Where: ACTION := reclassify | drop | continue | pass | pipe |
goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
RAND := random <RANDTYPE> <ACTION> <VAL>
RANDTYPE := netrand | determ
VAL : = value not exceeding 10000
JUMP_COUNT := Absolute jump from start of action list
INDEX := index value used
However, passing a control action of gact rule during filter binding works:
% sudo tc filter add dev lo parent ffff: \
protocol ip pref 10 u32 match ip src 127.0.0.2 \
flowid 1:10 action gact pipe index 1
Binding by reference, i.e. by index, has to consistently work with
any tc action.
Since tc is sensitive to the order of keywords passed on the command line,
we can teach gact to skip parsing arguments as soon as it sees 'gact'
followed by 'index' keyword.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
sscanf truncates read port values silently without any error. As sscanf
man says:
(...) sscanf() conform to C89 and C99 and POSIX.1-2001. These standards
do not specify the ERANGE error.
Replace sscanf with safer get_be16 that returns error when value is out
of range.
Example:
tc filter add dev eth0 protocol ip parent ffff: prio 1 flower ip_proto
tcp dst_port 70000 hw_tc 1
Would result in filter for port 4464 without any warning.
Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The mirred act admits an optional control action, defaulting
to TC_ACT_PIPE. The parsing code currently emits an error message
if the control action is not provided on the command line, even
if the command itself completes with no error.
This change shuts down the error message, using the appropriate
parsing helper.
Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Every tool in the iproute2 package have one or more function to show
an help message to the user. Some of these functions print the help
line by line with a series of printf call, e.g. ip/xfrm_state.c does
60 fprintf calls.
If we group all the calls to a single one and just concatenate strings,
we save a lot of libc calls and thus object size. The size difference
of the compiled binaries calculated with bloat-o-meter is:
ip/ip:
add/remove: 0/0 grow/shrink: 5/15 up/down: 103/-4796 (-4693)
Total: Before=672591, After=667898, chg -0.70%
ip/rtmon:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-54 (-54)
Total: Before=48879, After=48825, chg -0.11%
tc/tc:
add/remove: 0/2 grow/shrink: 31/10 up/down: 882/-6133 (-5251)
Total: Before=351912, After=346661, chg -1.49%
bridge/bridge:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-459 (-459)
Total: Before=70502, After=70043, chg -0.65%
misc/lnstat:
add/remove: 0/1 grow/shrink: 1/0 up/down: 48/-486 (-438)
Total: Before=9960, After=9522, chg -4.40%
tipc/tipc:
add/remove: 0/0 grow/shrink: 1/1 up/down: 18/-62 (-44)
Total: Before=79182, After=79138, chg -0.06%
While at it, indent some strings which were starting at column 0,
and use tabs where possible, to have a consistent style across helps.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This allows a cycle-time and a cycle-time-extension to be specified.
Specifying a cycle-time will truncate that cycle, so when that instant
is reached, the cycle will start from its beginning.
A cycle-time-extension may cause the last entry of a cycle, just
before the start of a new schedule (the base-time of the "admin"
schedule) to be extended by at maximum "cycle-time-extension"
nanoseconds. The idea of this feauture, as described by the IEEE
802.1Q, is too avoid too narrow gate states.
Example:
tc qdisc change dev IFACE parent root handle 100 taprio \
sched-entry S 0x1 1000000 \
sched-entry S 0x0 2000000 \
sched-entry S 0x1 3000000 \
sched-entry S 0x0 4000000 \
cycle-time-extension 100000 \
cycle-time 9000000 \
base-time 12345678900000000
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This allows for a new schedule to be specified during runtime, without
removing the current one.
For that, the semantics of the 'tc qdisc change' operation in the
context of taprio is that if "change" is called and there is a running
schedule, a new schedule is created and the base-time (let's call it
X) of this new schedule is used so at instant X, it becomes the
"current" schedule. So, in short, "change" doesn't change the current
schedule, it creates a new one and sets it up to it becomes the
current one at some point.
In IEEE 802.1Q terms, it means that we have support for the
"Oper" (current and read-only) and "Admin" (future and mutable)
schedules.
Example of creating the first schedule, then adding a new one:
(1)
tc qdisc add dev IFACE parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
sched-entry S 0x1 1000000 \
sched-entry S 0x0 2000000 \
sched-entry S 0x1 3000000 \
sched-entry S 0x0 4000000 \
base-time 100000000 \
clockid CLOCK_TAI
(2)
tc qdisc change dev IFACE parent root handle 100 taprio \
base-time 7500000000000 \
sched-entry S 0x0 5000000 \
sched-entry S 0x1 5000000 \
It was necessary to fix a bug, so the clockid doesn't need to be
specified when changing the schedule.
Most of the changes are related to make it easier to reuse the same
function for printing the "admin" and "oper" schedules.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
sch_plug can be used to perform functional qdisc unit tests
controlling explicitly the queuing behaviour from user-space.
Plug support lacks since its introduction in 2012. This change
introduces basic support, to control the tc status.
v1 -> v2:
- use the SPDX identifier
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This adds support for the newly added fwmark option to CAKE, which allows
overriding the tin selection from the per-packet firewall marks. The fwmark
field is a bitmask that is applied to the fwmark to select the tin.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
tc_pie_xstats->prob has a maximum value of (2^64 - 1).
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
action m_connmark returns error messages identifying itself as the
'simple' action instead of 'connmark' action. e.g.
tc filter add dev eth0 protocol all u32 match u32 0 0 flowid 1:1 \
action connmark index wrong
simple: Illegal "index"
bad action parsing
parse_action: bad value (3:connmark)!
Illegal "action"
In what is most likely a copy/paste error from the simple action example
code, fix connmark error messages to identify themselves as coming from
connmark.
tc filter add dev eth0 protocol all u32 match u32 0 0 flowid 1:1 \
action connmark index wrong
connmark: Illegal "index"
bad action parsing
parse_action: bad value (3:connmark)!
Illegal "action"
While we're here also fixup the 'Illegal "Zone"' error code to say
'Illegal "zone"' instead of 'Illegal "index"'
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tc pedit action with more than two ip6 munge in a row cause infinite
loop.
Example:
$ tc filter add dev eth0 protocol ipv6 parent ffff: \
flower ip_proto sctp \
action pedit ex \
munge ip6 hoplimit set 0x1 \
munge ip6 src set 2001:0db8:0:f101::1 \
munge that cause infinite loop
The example command never returns, instead of failing with parse error
as expected. Pedit ipv6 structure has wrong id, which leads to the
creation linked list with one node in tc/m_pedit.c:get_pedit_kind(),
referring to itself. This node is created if command have two ip6 munge
in a row, and any third ip6 munge will cause infinite loop.
Changing this id from "ipv6" to "ip6" solves the problem.
Fixes: f3e1b2448a ("pedit: Introduce ipv6 support")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
As /sys/class/net/<iface>/speed indicates a value in Mbits/sec, the
conversion is necessary to create the correct limits.
This guarantees the same result for the following commands in an
1000Mbit/sec device:
tc class add ... htb rate 500Mbit
tc class add ... htb rate 50%
Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Marcos Antonio Moraes <marcos.antonio@digirati.com.br>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The parse_percent_rate function assumed the buffer was 20 characters.
Better to pass length in case the size ever changes.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If value passed to parse_percent was not valid, it would
leak the dynamic allocation from sscanf.
Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
unlike other TC actions already supporting JSON printout, 'csum' does not
print the value of TCA_KIND in the 'kind' property: remove 'csum' word
from 'csum' property, and add a separate 'kind' property containing the
action name. The human-readable printout is preserved.
Tested with:
# ./tdc.py -c csum
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The kernel (and iproute2) don't use the if (NULL == x) style
and instead prefer if (!x)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
XATTR_SIZE_MAX requires the usage of linux/limits.h; let's include it
Signed-off-by: Hans Dedecker <dedeckeh@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Change the id parameter of the tunnel_key set action from mandatory to
optional.
Some tunneling protocols (e.g. GRE) specify the id as an optional field.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Martin Olsson <martin.olsson+netdev@sentorsecurity.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The incorrect setting of LDFLAGS causes error below:
> em_ipt.o: In function `em_ipt_print_epot':
> em_ipt.c:(.text.em_ipt_print_epot+0x2e): undefined reference to
> `xtables_init_all'
em_ipt.c gets involved when TC_CONFIG_XT=y, which requires xtables,
while tc/Makefile doesn't pass flags correctly. It adds '-lxtables'
to LDFLAGS instead of LDLIBS.
Fixes: dd296215 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")
Signed-off-by: Syrone Wong <wong.syrone@gmail.com>
Acked-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The argument to print_0xhex is converted to unsigned long long
so the format string give for normal printout has to be some
variant of %llx. Otherwise, bogus values will be printed on
32 bit platforms.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.
Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
prio 1 flower ip_proto tcp dst_port 20-30 skip_hw\
action drop
$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto tcp
dst_port 20-30
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 85 sec used 3 sec
Action statistics:
Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port 100-200\
skip_hw action drop
$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto tcp
dst_ip 192.168.1.1
dst_port 100-200
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 2 ref 1 bind 1 installed 58 sec used 2 sec
Action statistics:
Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
v6:
Modified to change json output format as object for sport/dport.
"dst_port":{
"start":2000,
"end":6000
},
"src_port":{
"start":50,
"end":60
}
v5:
Simplified some code and used 'sscanf' for parsing. Removed
space in output format.
v4:
Added man updates explaining filtering based on port ranges.
Removed 'range' keyword.
v3:
Modified flower_port_range_attr_type calls.
v2:
Addressed Jiri's comment to sync output format with input
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Kernel commit 48872c11b772 ("net_sched: sch_fq: add dctcp-like marking")
added support for TCA_FQ_CE_THRESHOLD attribute.
This patch adds iproute2 support for it.
It also makes sure fq_print_xstats() can deal with smaller tc_fq_qd_stats
structures given by older kernels.
Usage :
FQATTRS="ce_threshold 4ms"
TXQS=8
for ETH in eth0
do
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 $TXQS`
do
tc qd add dev $ETH parent 1:$i fq $FQATTRS
done
done
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Kernel now support setting ECN and HARDDROP flags per-virtual
queue. Allow users to tweak the settings, and print them on
dump.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Kernel GRED qdisc supports ECN marking, and the harddrop flag
but setting and dumping this flag is not possible with iproute2.
Add the support.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Use the extended attributes with extra and better stats, when
possible.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Printing GRED statistics is long and deserves a function on its own.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Number of qdiscs use the same set of flags to control shared RED
implementation. Add a helper for printing those flags.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The comment about providing a proper message seems similar to
the comment in the kernel which says:
/* hack -- fix at some point with proper message
This is how we indicate to tc that there is no VQ
at this DP */
it's unclear what that message would be, and whether it's needed.
Remove the confusing comment.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.
Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
action drop
$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto tcp
dst_port range 20-30
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 85 sec used 3 sec
Action statistics:
Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
skip_hw action drop
$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto tcp
dst_ip 192.168.1.1
dst_port range 100-200
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 2 ref 1 bind 1 installed 58 sec used 2 sec
Action statistics:
Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
v3:
Modified flower_port_range_attr_type calls.
v2:
Addressed Jiri's comment to sync output format with input
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The pedit callback structure table should be iniatialized using
structure initialization to avoid structure changes problems.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The tc util library parse/print has functions only used locally
(and some dead code removed).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The print handling is only used in tc/m_ematch.c
Remove unused function to print_ematch_tree.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
u32 uses NEXT_ARG() incorrectly when parsing skip_hw and skip_sw
flags. NEXT_ARG() ensures there is another argument on the command
line, and is used in handling <keyword> <value> syntax to move past
<keyword> and ensure there is a <value> to read.
Commit 5e5b3008d1 ("tc: f_u32: Add support for skip_hw and skip_sw
flags") seems to have copy pasted the handling from the previous
command - "police", which needs an extra parameter and is kind of
special due to the use of parse_police() helper.
The combination of NEXT_ARG() and continue worked fine as long as
skip_sw/skip_hw wasn't last, e.g.:
$ tc filter add dev dummy0 ingress prio 101 protocol ipv6 \
u32 match ip6 priority 0xa0 0xe0 skip_hw action pass
But would fail if it was last:
$ tc filter add dev dummy0 ingress prio 101 protocol ipv6 \
u32 match ip6 priority 0xa0 0xe0 flowid :1 skip_hw
Command line is not complete. Try option "help"
Remove the NEXT_ARG()s and the continues, and let the argc--; argv++;
at the end of the loop do its job.
Fixes: 5e5b3008d1 ("tc: f_u32: Add support for skip_hw and skip_sw flags")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When building Debian packages pre-processor flags are passed via
CPPFLAGS, as the convention indicates. Specifically, the hardening
-D_FORTIFY_SOURCE=2 flag is used.
Pass CPPFLAGS to all calls of QUIET_CC together with CFLAGS.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This is simpler and cleaner, and avoids having to include the header
from every file where the functions are used. The prototypes of the
internal implementation are in this header, so utils.h will have to be
included anyway for those.
Fixes: 508f3c231e ("Use libbsd for strlcpy if available")
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If libc does not provide strlcpy check for libbsd with pkg-config to
avoid relying on inline version.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Value of 'default' is assumed to be hexadecimal when parsing, so
consequently it should be printed in hex as well. This is a regression
introduced when adding JSON output.
As requested, also change JSON output to print the value as hex string.
Fixes: f354fa6aa5 ("tc: jsonify htb qdisc")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
All these assignments are later overwritten without reading in between,
so just drop them.
Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
No function, filter, or print function uses the sockaddr_nl arg,
so just drop it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Print limits correctly in JSON context.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.
Example configuration:
tc qdisc replace dev enp3s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
base-time 1528743495910289987 \
sched-entry S 01 300000 \
sched-entry S 02 300000 \
sched-entry S 04 300000 \
clockid CLOCK_TAI
The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:
sched-entry <CMD> <GATE MASK> <INTERVAL>
The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.
This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.
The other parameters can be defined as follows:
- base-time: specifies the instant when the schedule starts, if
'base-time' is a time in the past, the schedule will start at
base-time + (N * cycle-time)
where N is the smallest integer so the resulting time is greater
than "now", and "cycle-time" is the sum of all the intervals of the
entries in the schedule;
- clockid: specifies the reference clock to be used;
The parameters should be similar to what the IEEE 802.1Q family of
specification defines.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Recently flower classifier was updated to expose count of devices that
filter is offloaded to. Add support to print this counter as 'in_hw_count'.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Allow matching on options in Geneve tunnel headers.
The options can be described in the form
CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is
represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.
e.g.
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev geneve0 ingress
# tc filter add dev geneve0 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \
ip_proto udp \
action mirred egress redirect dev eth1
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Conflicts:
ip/iproute_lwtunnel.c
In addition to merge conflict between bd59e5b151 and 94a8722f2f,
updated the code added by the latter commit based on the change of the
former (ie., added ret = to the new rta_addattr_l).
Signed-off-by: David Ahern <dsahern@gmail.com>
Similar to the previous patch for no-split-gso, the negative keywords for
'nat', 'wash' and 'ack-filter' were not printed either. Add those well.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When the GSO splitting was turned into dual split-gso/no-split-gso options,
the printing of the latter was left out. Add that, so output is consistent
with the options passed.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Common pattern in iproute commands is to print a line seperator
in non-json mode. Make that a simple function.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Print the name of the argument that wasn't understood.
Signed-off-by: Caleb Raitto <caraitto@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.
Syntax:
slot distribution DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
[bytes MAX_BYTES]
The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.
Examples:
Normal distribution delay with mean = 800us and stdev = 100us.
> tc qdisc add dev eth0 root netem slot distribution normal \
800us 100us
Optionally set the max slot size in bytes and/or packets.
> tc qdisc add dev eth0 root netem slot distribution normal \
800us 100us bytes 64k packets 42
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.
It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.
The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.
The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.
Examples of use:
tc qdisc add dev eth0 root netem delay 200us \
slot 800us 10ms bytes 64k packets 42
A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:
tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
slot 800us 10ms bytes 64k packets 42 limit 512
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Using a 32 bit field to represent time in nanoseconds results in a
maximum value of about 4.3 seconds, which is well below many observed
delays in WiFi and LTE, and barely in the ballpark for a trip past the
Earth's moon, Luna.
Using 64 bit time fields in nanoseconds allows us to simulate
network diameters of several hundred light-years. However, only
conversions to and from ns, us, ms, and seconds are provided.
The iproute2 64 bit api uses signed values for time. Being able to
represent positive or negative time allows us to calculate +/- deltas
between, for example, the CLOCK_TAI and CLOCK_REALTIME clocks.
Time related utility functions in tc_util.c are moved to lib/utils.c.
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Since introduction of htb module, this variable has never been used.
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
These are primarily fixes for "string is not string literal" warnings
/ errors (with -Werror -Wformat-nonliteral). This should be a no-op
change. I had to replace couple of print helper functions with the
code they call as it was becoming harder to eliminate these warnings,
however these helpers were used only at couple of places, so no
major change as such.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Allow for -color={never,auto,always} to have colored output disabled,
enabled only if stdout is a terminal or enabled regardless of stdout
state.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Instead of calling enable_color() conditionally with identical check in
three places, introduce check_enable_color() which does it in one place.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
The check used binary instead of boolean AND, which means colored output
was enabled only if the number of specified '-color' flags was odd.
Fixes: 2d165c0811 ("tc: implement color output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
sch_skbprio is a qdisc that prioritizes packets according to their skb->priority
field. Under congestion, it drops already-enqueued lower priority packets to
make space available for higher priority packets. Skbprio was conceived as a
solution for denial-of-service defenses that need to route packets with
different priorities as a means to overcome DoS attacks.
Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch makes sch_cake's gso/gro splitting configurable
from userspace.
To disable breaking apart superpackets in sch_cake:
tc qdisc replace dev whatever root cake no-split-gso
to enable:
tc qdisc replace dev whatever root cake split-gso
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add matching on tos/ttl of the IP tunnel headers.
For example, here's decap rule that matches on the tunnel tos:
tc filter add dev vxlan_sys_4789 protocol ip parent ffff: prio 10 flower \
enc_src_ip 192.168.10.2 enc_dst_ip 192.168.10.1 enc_key_id 100 enc_dst_port 4789 enc_tos 0x30 \
src_mac e4:11:22:33:44:70 dst_mac e4:11:22:33:44:50 \
action tunnel_key unset \
action mirred egress redirect dev eth0_0
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Allow to set tos and ttl for the tunnel.
For example, here's encap rule that sets tos to the tunnel:
tc filter add dev eth0_0 protocol ip parent ffff: prio 10 flower \
src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:70 \
action tunnel_key set src_ip 192.168.10.1 dst_ip 192.168.10.2 id 100 dst_port 4789 tos 0x30 \
action mirred egress redirect dev vxlan_sys_4789
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This is consistent with the other multi-word parameters. Also change the
JSON output to be consistent with way it is formatted for the other
options.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
Here we are partially reverting commit c14f9d92ee
"treewide: Use addattr_nest()/addattr_nest_end() to handle nested
attributes" .
As discussed in [1], changing from the 'manually' coded version that
used addattr_l() to addattr_nest_compat() wasn't functionally
equivalent, because now the messages have extra fields appended to it.
This introduced a regression since the implementation of parse_attr()
from both mqprio and netem can't handle this new message format.
Without this fix, mqprio returns an error. netem won't return an error
but its internal configuration ends up wrong.
As an example, this can be reproduced by the following commands when
this patch is not applied:
1) mqprio
$ tc qdisc replace dev enp3s0 parent root handle 100 mqprio \
num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 hw 0
RTNETLINK answers: Numerical result out of range
2) netem
$ tc qdisc add dev enp3s0 root netem rate 5kbit 20 100 5 \
distribution normal latency 1 1
$ tc -s qdisc
(...)
qdisc netem 8001: dev enp3s0 root refcnt 9 limit 1000 delay 0us 0us
Sent 402 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
(...)
With this patch applied, the tc -s qdisc command above for netem instead
reads:
(...)
qdisc netem 8002: dev enp3s0 root refcnt 9 limit 1000 delay 0us 0us \
rate 5Kbit packetoverhead 20 cellsize 100 celloverhead 5
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
(...)
[1] https://patchwork.ozlabs.org/patch/867860/#1893405
Fixes: c14f9d92ee ("treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes")
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.
Example of use on a cable ISP uplink:
tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter
To shape a cable download link (ifb and tc-mirred setup elided)
tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort
Cake is filled with:
* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.
Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.
sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.
Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Tony Ambardar, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.
Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.
v4:
* Make tc use netlink helper functions
v3:
* Make flag represented in JSON output as a null value
v2:
* Align the output syntax with the input syntax
* Fix the style issues
Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The "Earliest TxTime First" (ETF) queueing discipline allows precise
control of the transmission time of packets by providing a sorted
time-based scheduling of packets.
The syntax is:
tc qdisc add dev DEV parent NODE etf delta <DELTA>
clockid <CLOCKID> [offload] [deadline_mode]
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Conversion to print stats in JSON forgot to remove existing
fprintf.
Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
A commandline like 'tc -d class show dev dev-name' does not
display value of prio and quantum option when we use htb qdisc.
This patch fixes the bug.
Signed-off-by: Fumihiko Kakuma <kakuma@valinux.co.jp>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Example output is of tos and ttl.
Befoe this fix the format used %x caused output of the pointer
instead of the intended string created in the out variable.
Fixes: e28b88a464 ("tc: jsonify flower filter")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Allow setting tunnel options using the act_tunnel_key action.
Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_proto udp \
action tunnel_key \
set src_ip 10.0.99.192 \
dst_ip 10.0.99.193 \
dst_port 6081 \
id 11 \
geneve_opts 0102:80:00800022,0102:80:00800022 \
action mirred egress redirect dev geneve0
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
When sending accumulated compound command results an error, check 'force'
option before exiting. Move return code check after putting batch bufs and
freeing iovs to prevent memory leak. Break from loop, instead of returning
error code to allow cleanup at the end of batch function. Don't reset ret
code on each iteration.
Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
In order to make TDC tests match the output patterns, the missing space
character must be added in the mode output string.
Fixes: 8744c5d338 ("tc: jsonify ife action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>