mirror of
https://git.proxmox.com/git/mirror_iproute2
synced 2025-11-01 10:18:26 +00:00
This work adds the tc frontend for kernel commit e2e9b6541dd4 ("cls_bpf:
add initial eBPF support for programmable classifiers").
A C-like classifier program (f.e. see e2e9b6541dd4) is being compiled via
LLVM's eBPF backend into an ELF file, that is then being passed to tc. tc
then loads, if any, eBPF maps and eBPF opcodes (with fixed-up eBPF map file
descriptors) out of its dedicated sections, and via bpf(2) into the kernel
and then the resulting fd via netlink down to cls_bpf. cls_bpf allows for
annotations, currently, I've used the file name for that, so that the user
can easily identify his filter when dumping configurations back.
Example usage:
clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
tc filter add dev em1 parent 1: bpf run object-file cls.o classid x:y
tc filter show dev em1 [...]
filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid x:y cls.o
I placed the parser bits derived from Alexei's kernel sample, into tc_bpf.c
as my next step is to also add the same support for BPF action, so we can
have a fully fledged eBPF classifier and action in tc.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
|
||
|---|---|---|
| .. | ||
| .gitignore | ||
| em_canid.c | ||
| em_cmp.c | ||
| em_ipset.c | ||
| em_meta.c | ||
| em_nbyte.c | ||
| em_u32.c | ||
| emp_ematch.l | ||
| emp_ematch.y | ||
| f_basic.c | ||
| f_bpf.c | ||
| f_cgroup.c | ||
| f_flow.c | ||
| f_fw.c | ||
| f_route.c | ||
| f_rsvp.c | ||
| f_tcindex.c | ||
| f_u32.c | ||
| m_action.c | ||
| m_bpf.c | ||
| m_csum.c | ||
| m_ematch.c | ||
| m_ematch.h | ||
| m_estimator.c | ||
| m_gact.c | ||
| m_ipt.c | ||
| m_mirred.c | ||
| m_nat.c | ||
| m_pedit.c | ||
| m_pedit.h | ||
| m_police.c | ||
| m_simple.c | ||
| m_skbedit.c | ||
| m_vlan.c | ||
| m_xt_old.c | ||
| m_xt.c | ||
| Makefile | ||
| p_icmp.c | ||
| p_ip.c | ||
| p_tcp.c | ||
| p_udp.c | ||
| q_atm.c | ||
| q_cbq.c | ||
| q_choke.c | ||
| q_codel.c | ||
| q_drr.c | ||
| q_dsmark.c | ||
| q_fifo.c | ||
| q_fq_codel.c | ||
| q_fq.c | ||
| q_gred.c | ||
| q_hfsc.c | ||
| q_hhf.c | ||
| q_htb.c | ||
| q_ingress.c | ||
| q_mqprio.c | ||
| q_multiq.c | ||
| q_netem.c | ||
| q_pie.c | ||
| q_prio.c | ||
| q_qfq.c | ||
| q_red.c | ||
| q_rr.c | ||
| q_sfb.c | ||
| q_sfq.c | ||
| q_tbf.c | ||
| README.last | ||
| static-syms.c | ||
| tc_bpf.c | ||
| tc_bpf.h | ||
| tc_cbq.c | ||
| tc_cbq.h | ||
| tc_class.c | ||
| tc_common.h | ||
| tc_core.c | ||
| tc_core.h | ||
| tc_estimator.c | ||
| tc_filter.c | ||
| tc_monitor.c | ||
| tc_qdisc.c | ||
| tc_red.c | ||
| tc_red.h | ||
| tc_stab.c | ||
| tc_util.c | ||
| tc_util.h | ||
| tc.c | ||
Kernel code and interface. -------------------------- * Compile time switches There is only one, but very important, compile time switch. It is not settable by "make config", but should be selected manually and after a bit of thinking in <include/net/pkt_sched.h> PSCHED_CLOCK_SOURCE can take three values: PSCHED_GETTIMEOFDAY PSCHED_JIFFIES PSCHED_CPU PSCHED_GETTIMEOFDAY Default setting is the most conservative PSCHED_GETTIMEOFDAY. It is very slow both because of weird slowness of do_gettimeofday() and because it forces code to use unnatural "timeval" format, where microseconds and seconds fields are separate. Besides that, it will misbehave, when delays exceed 2 seconds (f.e. very slow links or classes bounded to small slice of bandwidth) To resume: as only you will get it working, select correct clock source and forget about PSCHED_GETTIMEOFDAY forever. PSCHED_JIFFIES Clock is derived from jiffies. On architectures with HZ=100 granularity of this clock is not enough to make reasonable bindings to real time. However, taking into account Linux architecture problems, which force us to use artificial integrated clock in any case, this switch is not so bad for schduling even on high speed networks, though policing is not reliable. PSCHED_CPU It is available only for alpha and pentiums with correct CPU timestamp. It is the fastest way, use it when it is available, but remember: not all pentiums have this facility, and a lot of them have clock, broken by APM etc. etc.