Commit Graph

5373 Commits

Author SHA1 Message Date
Bjarni Ingi Gislason
2df0dc2437 libnetlink.3: display section numbers in roman font, not boldface
Typeset section numbers in roman font, see man-pages(7).

###

  Details:

Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

<./man/man3/libnetlink.3>:53 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:132 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:134 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:197 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:198 (macro BR): only 1 argument, but more are expected

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-07-06 10:46:23 -07:00
David Ahern
a5c9d01c5c Merge branch 'rdma-raw-format-dumps' into next
Leon Romanovsky  says:

====================

The following series adds support to get the RDMA resource data in RAW
format. The main motivation for doing this is to enable vendors to
return the entire QP/CQ/MR data without a need from the vendor to set
each field separately.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:49 +00:00
Maor Gottlieb
e2bbf737e6 rdma: Add support to get MR in raw format
Add the required support to print MR data in raw format.
Example:

$rdma res show mr dev mlx5_1 mrn 2 -r -j
[{"ifindex":7,"ifname":"mlx5_1",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:37 +00:00
Maor Gottlieb
94323e9611 rdma: Add support to get CQ in raw format
Add the required support to print CQ data in raw format.
Example:

$rdma res show cq dev mlx5_2 cqn 1 -r -j
[{"ifindex":8,"ifname":"mlx5_2",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:33 +00:00
Maor Gottlieb
7c01e0fc9c rdma: Add support to get QP in raw format
Add 'raw' argument to get the resource in raw format.
When RDMA_NLDEV_ATTR_RES_RAW is set in the netlink message,
then the resource fields are in raw format, print it as byte array.

Example:
$rdma res show qp link rocep0s12f0/1 lqpn 1137 -j -r
[{"ifindex":7,"ifname":"mlx5_1","port":1,
"data":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:00 +00:00
Maor Gottlieb
8f23492823 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 65959522f806
("RDMA: Add support to dump resource tracker in RAW format")

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:10:50 +00:00
David Ahern
79ea01927c Merge branch 'tc-qevents' into next
Petr Machata  says:

====================

To allow configuring user-defined actions as a result of inner workings of
a qdisc, a concept of qevents was recently introduced to the kernel.
Qevents are attach points for TC blocks, where filters can be put that are
executed as the packet hits well-defined points in the qdisc algorithms.
The attached blocks can be shared, in a manner similar to clsact ingress
and egress blocks, arbitrary classifiers with arbitrary actions can be put
on them, etc.

For example:

 # tc qdisc add dev eth0 root handle 1: \
	red limit 500K avpkt 1K qevent early_drop block 10
 # tc filter add block 10 \
	matchall action mirred egress mirror dev eth1

This patch set introduces the corresponding iproute2 support. Patch #1 adds
the new netlink attribute enumerators. Patch #2 adds a set of helpers to
implement qevents, and #3 adds a generic documentation to tc.8. Patch #4
then adds two new qevents to the RED qdisc: mark and early_drop.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:45:48 +00:00
Petr Machata
d0e4504385 tc: q_red: Add support for qevents "mark" and "early_drop"
The "early_drop" qevent matches packets that have been early-dropped. The
"mark" qevent matches packets that have been ECN-marked.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:49 +00:00
Petr Machata
3cf51fb3c8 man: tc: Describe qevents
Add some general remarks about qevents.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:45 +00:00
Petr Machata
01bb0bcd00 tc: Add helpers to support qevent handling
Introduce a set of helpers to make it easy to add support for qevents into
qdisc.

The idea behind this is that qevent types will be generally reused between
qdiscs, rather than each having a completely idiosyncratic set of qevents.
The qevent module holds functions for parsing, dumping and formatting of
these common qevent types, and for dispatch to the appropriate set of
handlers based on the qevent name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:27 +00:00
Po Liu
bc4d9f982f action police: make 'mtu' could be set independently in police action
Current police action must set 'rate' and 'burst'. 'mtu' parameter
set the max frame size and could be set alone without 'rate' and 'burst'
in some situation. Offloading to hardware for example, 'mtu' could limit
the flow max frame size.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:34:04 +00:00
Po Liu
3c5570706b action police: change the print message quotes style
Change the double quotes to single quotes in fprintf message to make it
more readable.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:33:59 +00:00
Alexandre Cassen
30f3beea0d add support to keepalived rtm_protocol
Following inclusion in net-next, extend rtnl_rtprot_tab and rt_protos
to support Keepalived.

Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:03:45 +00:00
David Ahern
482e463d6c Merge branch 'devlink-port-mac-addr' into next
Parav Pandit  says:

====================

Currently ip link set dev <pfndev> vf <vf_num> <param> <value> has
few below limitations.

1. Command is limited to set VF parameters only.
It cannot set the default MAC address for the PCI PF.

2. It can be set only on system where PCI SR-IOV is supported.
In smartnic based system, eswitch of a NIC resides on a different
embedded cpu which has the VF and PF representors for the SR-IOV
support on a host system in which this smartnic is plugged-in.

3. It cannot setup the function attributes of sub-function described
in detail in comprehensive RFC [1] and [2].

This series covers the first small part to let user query and set MAC
address (hardware address) of a PCI PF/VF which is represented by
devlink port.

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:53 +00:00
Parav Pandit
4dca81e9a8 devlink: Support setting port function hardware address
Support setting devlink port function hardware address.

Example of a PCI VF port which supports a port function:
Set hardware address of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:55

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:32 +00:00
Parav Pandit
b3adafd154 devlink: Support querying hardware address of port function
Add support to query the hardware address of function represented
by devlink port function.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:22 +00:00
Parav Pandit
2de449df19 devlink: Move devlink port code at start to reuse
To reuse print routines for port function in subsequent patch, move
print routine specific to devlink device at start of the file.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:48:34 +00:00
David Ahern
e17466e484 Update kernel headers
Update kernel headers to commit:
   e1f046704404 ("Merge branch 'qlogic-use-generic-power-management'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:33:15 +00:00
Stephen Hemminger
2f31d12a25 man/tc: remove obsolete reference to ipchains
It isn't Linux 2.2 anymore.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-24 12:13:46 -07:00
Roi Dayan
473d18e219 ip address: Fix loop initial declarations are only allowed in C99
On some distros, i.e. rhel 7.6, compilation fails with the following:

ipaddress.c: In function ‘lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
  ^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code

This commit fixes the single place needed for compilation to pass.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 15:05:20 -07:00
Stephen Hemminger
3d66d83d25 uapi: update to magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:52:38 -07:00
Ido Schimmel
abda1e9d2b devlink: Add 'mirror' trap action
Allow setting 'mirror' trap action for traps that support it. Extend the
devlink-trap man page and bash completion accordingly.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_query
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_query",
                "type": "control",
                "generic": true,
                "action": "mirror",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Ido Schimmel
fd71244a20 devlink: Add 'control' trap type
This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_v1_report
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_v1_report",
                "type": "control",
                "generic": true,
                "action": "trap",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Stephen Hemminger
12fafa27c7 devlink: update include files
Use the tool iwyu to get more complete list of includes for
all the bits used by devlink.

This should also fix build with musl libc.

Fixes: c4dfddccef ("fix JSON output of mon command")
Reported-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:49:46 -07:00
Roopa Prabhu
468f787f64 bridge: support for nexthop id in fdb entries
This patch adds support to assign a nexthop group
id to an fdb entry.

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:58 +00:00
Roopa Prabhu
a56d17463c ipnexthop: support for fdb nexthops
This patch adds support to add and delete
ecmp nexthops of type fdb. Such nexthops can
be linked to vxlan fdb entries.

$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:29 +00:00
David Ahern
5f6f17db3b Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-08 14:40:54 +00:00
Stephen Hemminger
e4932ae6b3 uapi: update headers
Update kernel headers from 5.8.0 merge

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-05 08:36:54 -07:00
Stephen Hemminger
0a5dbbeddb Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2020-06-05 08:33:29 -07:00
Stephen Hemminger
1bfa3b3f66 v5.7.0 2020-06-02 20:35:00 -07:00
Donald Sharp
2c78aba2fb nexthop: Fix Deletion display
Actually display that deletions are happening
when monitoring nexthops.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-01 08:08:46 -07:00
Stephen Hemminger
6facadcfb6 uapi: fix comment in xfrm.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-01 08:07:02 -07:00
Ian K. Coolidge
5413a735a6 iproute2: ip addr: Add support for setting 'optimistic'
optimistic DAD is controllable via sysctl for an interface
or all interfaces on the system. This would affect addresses
added by the kernel only.

Recent kernels, however, have enabled support for adding optimistic
address via userspace. This plumbs that support.

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:33 +00:00
Ian K. Coolidge
9d59c86e57 iproute2: ip addr: Organize flag properties structurally
This creates a nice systematic way to check that the various flags are
mutable from userspace and that the address family is valid.

Mutability properties are preserved to avoid introducing any behavioral
change in this CL. However, previously, immutable flags were ignored and
fell through to this confusing error:

Error: either "local" is duplicate, or "dadfailed" is a garbage.

But now, they just warn more explicitly:

Warning: dadfailed option is not mutable from userspace
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:22 +00:00
Roman Mashak
bd4b8c632e tc: report time an action was first used
Have print_tm() dump firstuse value along with install, lastuse
and expires.

v2: Resubmit after 'master' merged into next

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 22:51:19 +00:00
Andrea Claudi
354efaec38 bpf: Fixes a snprintf truncation warning
gcc v9.3.1 reports:

bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this simply checking snprintf return code and properly handling the error.

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-27 15:05:25 -07:00
Andrea Claudi
358abfe004 Revert "bpf: replace snprintf with asprintf when dealing with long buffers"
This reverts commit c0325b0638.
It introduces a segfault in bpf_make_custom_path() when custom pinning is used.

This happens because asprintf allocates exactly the space needed to hold a
string in the buffer passed as its first argument, but if this buffer is later
used in strcat() or similar we have a buffer overrun.

As the aim of commit c0325b0638 is simply to fix a compiler warning, it
seems safe and reasonable to revert it.

Fixes: c0325b0638 ("bpf: replace snprintf with asprintf when dealing with long buffers")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-27 15:05:25 -07:00
David Ahern
e50290e687 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:08:27 +00:00
Tuong Lien
9a25abde3a tipc: enable printing of broadcast rcv link stats
This commit allows printing the statistics of a broadcast-receiver link
using the same tipc command, but with additional 'link' options:

$ tipc link stat show --help
Usage: tipc link stat show [ link { LINK | SUBSTRING | all } ]

With:
+ 'LINK'      : print the stats of the specific link 'LINK';
+ 'SUBSTRING' : print the stats of all the links having the 'SUBSTRING'
                in name;
+ 'all'       : print all the links' stats incl. the broadcast-receiver
                ones;

Also, a link stats can be reset in the usual way by specifying the link
name in command.

For example:

$ tipc l st sh l br
Link <broadcast-link>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:5011125 fragments:4968774/149643 bundles:38402/307061
  RX naks:781484 defs:0 dups:0
  TX naks:0 acks:0 retrans:330259
  Congestion link:50657  Send queue max:0 avg:0

Link <broadcast-link:1001001>
  Window:50 packets
  RX packets:95146 fragments:95040/1980 bundles:1/2
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:380938 defs:83962 dups:403
  TX naks:8362 acks:0 retrans:170662
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st sh l 1001002
Link <1001003:data0-1001002:data0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:99546 fragments:0/0 bundles:33/877
  TX packets:629 fragments:0/0 bundles:35/828
  TX profile sample:8 packets average:390 octets
  0-64:75% -256:0% -1024:0% -4096:25% -16384:0% -32768:0% -66000:0%
  RX states:488714 probes:7397 naks:0 defs:4 dups:5
  TX states:27734 probes:18016 naks:5 acks:2305 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st re l broadcast-link:1001002

$ tipc l st sh l broadcast-link:1001002
Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:07:22 +00:00
Alexander Aring
9f91f1b7b8 lwtunnel: add support for rpl segment routing
This patch adds support for rpl segment routing settings.
Example:

ip -n ns0 -6 route add 2001::3 encap rpl segs \
fe80::c8fe:beef:cafe:cafe,fe80::c8fe:beef:cafe:beef dev lowpan0

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 00:03:17 +00:00
Roman Mashak
db35e411ec tc: action: fix time values output in JSON format
Report tcf_t values in seconds, not jiffies, in JSON format as it is now
for stdout.

v2: use PRINT_ANY, drop the useless casts and fix the style (Stephen Hemminger)

Fixes: 2704bd6255 ("tc: jsonify actions core")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 21:19:04 -07:00
Stephen Hemminger
1c7aa12104 uapi: update to bpf.h
Part of the zero-length array changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:31:54 -07:00
Eric Dumazet
d7c67a6ed4 utils: remove trailing zeros in print_time() and print_time64()
Before :

tc qd sh dev eth1

... refill_delay 40.0ms timer_slack 10.000us horizon 10.000s

After :
... refill_delay 40ms timer_slack 10us horizon 10s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:30:30 -07:00
Paul Blakey
924c43778a man: tc-ct.8: Add manual page for ct tc action
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:30:24 -07:00
Maciej Fijalkowski
42796dcd36 tc: mqprio: reject queues count/offset pair count higher than num_tc
Provide a sanity check that will make sure whether queues count/offset
pair count will not exceed the actual number of TCs being created.

Example command that is invalid because there are 4 count/offset pairs
whereas num_tc is only 2.

 # tc qdisc add dev enp96s0f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 4@0 4@4 4@8 4@12 hw 1 mode channel

Store the parsed count/offset pair count onto a dedicated variable that
will be compared against opt.num_tc after all of the command line
arguments were parsed. Bail out if this count is higher than opt.num_tc
and let user know about it.

Drivers were swallowing such commands as they were iterating over
count/offset pairs where num_tc was used as a delimiter, so this is not
a big deal, but better catch such misconfiguration at the command line
argument parsing level.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-18 14:57:15 +00:00
Dmitry Yakunin
7bd9188581 ss: add checks for bc filter support
As noted by David Ahern, now if some bytecode filter is not supported
by running kernel printed error message is not clear. This patch is attempt to
detect such case and print correct message. This is done by providing checking
function for new filter types. As example check function for cgroup filter
is implemented. It sends correct lightweight request (idiag_states = 0)
with zero cgroup condition to the kernel and checks returned errno. If filter
is not supported EINVAL is returned. Result of checking is cached to
avoid extra checks if several same filters are specified.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:38 +00:00
Dmitry Yakunin
14f4bda590 ss: add support for cgroup v2 information and filtering
This patch introduces two new features: obtaining cgroup information and
filtering sockets by cgroups. These features work based on cgroup v2 ID
field in the socket (kernel should be compiled with CONFIG_SOCK_CGROUP_DATA).

Cgroup information can be obtained by specifying --cgroup flag and now contains
only pathname. For faster pathname lookups cgroup cache is implemented. This
cache is filled on ss startup and missed entries are resolved and saved
on the fly.

Cgroup filter extends EXPRESSION and allows to specify cgroup pathname
(relative or absolute) to obtain sockets attached only to this cgroup.
Filter syntax: ss [ cgroup PATHNAME ]
Examples:
    ss -a cgroup /sys/fs/cgroup/unified (or ss -a cgroup .)
    ss -a cgroup /sys/fs/cgroup/unified/cgroup1 (or ss -a cgroup cgroup1)

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:35 +00:00
Dmitry Yakunin
d5e6ee0dac ss: introduce cgroup2 cache and helper functions
This patch prepares infrastructure for matching sockets by cgroups.
Two helper functions are added for transformation between cgroup v2 ID
and pathname. Cgroup v2 cache is implemented as hash table indexed by ID.
This cache is needed for faster lookups of socket cgroup.

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:04 +00:00
Po Liu
965a5f6a1b iproute2-next: add gate action man page
This patch is to add the man page for the tc gate action.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:20:12 +00:00
Po Liu
07d5ee70b5 iproute2-next:tc:action: add a gate control action
Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 parent ffff: protocol ip \

            flower src_ip 192.168.0.20 \
            action gate index 2 clockid CLOCK_TAI \
            sched-entry open 200000000ns -1 8000000b \
            sched-entry close 100000000ns

 # tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval, the overlimit frames would be dropped.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

     1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

 #tc filter add dev eth0 parent ffff:  protocol ip \
           flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
           action gate index 12 base-time 1357000000000ns \
           sched-entry CLOSE 200000000ns \
           clockid CLOCK_TAI

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:19:46 +00:00