mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-12-09 08:33:35 +00:00
In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock. For example, in __netdev_pick_tx:
static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
struct net_device *sb_dev)
{
/* ... */
struct sock *sk = skb->sk;
if (queue_index != new_index && sk &&
sk_fullsock(sk) &&
rcu_access_pointer(sk->sk_dst_cache))
sk_tx_queue_set(sk, new_index);
/* ... */
return queue_index;
}
This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().
"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer. It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
are not allowed.
The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).
int cg_skb_foo(struct __sk_buff *skb) {
struct bpf_sock *sk;
sk = skb->sk;
if (!sk)
return 1;
sk = bpf_sk_fullsock(sk);
if (!sk)
return 1;
if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
return 1;
/* some_traffic_shaping(); */
return 1;
}
(1) The sk is read only
(2) There is no new "struct bpf_sock_common" introduced.
(3) Future kernel sock's members could be added to bpf_sock only
instead of repeatedly adding at multiple places like currently
in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
(4) After "sk = skb->sk", the reg holding sk is in type
PTR_TO_SOCK_COMMON_OR_NULL.
(5) After bpf_sk_fullsock(), the return type will be in type
PTR_TO_SOCKET_OR_NULL which is the same as the return type of
bpf_sk_lookup_xxx().
However, bpf_sk_fullsock() does not take refcnt. The
acquire_reference_state() is only depending on the return type now.
To avoid it, a new is_acquire_function() is checked before calling
acquire_reference_state().
(6) The WARN_ON in "release_reference_state()" is no longer an
internal verifier bug.
When reg->id is not found in state->refs[], it means the
bpf_prog does something wrong like
"bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
never been acquired by calling "bpf_sk_fullsock(skb->sk)".
A -EINVAL and a verbose are done instead of WARN_ON. A test is
added to the test_verifier in a later patch.
Since the WARN_ON in "release_reference_state()" is no longer
needed, "__release_reference_state()" is folded into
"release_reference_state()" also.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
||
|---|---|---|
| .. | ||
| bpf | ||
| cgroup | ||
| configs | ||
| debug | ||
| dma | ||
| events | ||
| gcov | ||
| irq | ||
| livepatch | ||
| locking | ||
| power | ||
| printk | ||
| rcu | ||
| sched | ||
| time | ||
| trace | ||
| .gitignore | ||
| acct.c | ||
| async.c | ||
| audit_fsnotify.c | ||
| audit_tree.c | ||
| audit_watch.c | ||
| audit.c | ||
| audit.h | ||
| auditfilter.c | ||
| auditsc.c | ||
| backtracetest.c | ||
| bounds.c | ||
| capability.c | ||
| compat.c | ||
| configs.c | ||
| context_tracking.c | ||
| cpu_pm.c | ||
| cpu.c | ||
| crash_core.c | ||
| crash_dump.c | ||
| cred.c | ||
| delayacct.c | ||
| dma.c | ||
| elfcore.c | ||
| exec_domain.c | ||
| exit.c | ||
| extable.c | ||
| fail_function.c | ||
| fork.c | ||
| freezer.c | ||
| futex.c | ||
| groups.c | ||
| hung_task.c | ||
| iomem.c | ||
| irq_work.c | ||
| jump_label.c | ||
| kallsyms.c | ||
| kcmp.c | ||
| Kconfig.freezer | ||
| Kconfig.hz | ||
| Kconfig.locks | ||
| Kconfig.preempt | ||
| kcov.c | ||
| kexec_core.c | ||
| kexec_file.c | ||
| kexec_internal.h | ||
| kexec.c | ||
| kmod.c | ||
| kprobes.c | ||
| ksysfs.c | ||
| kthread.c | ||
| latencytop.c | ||
| Makefile | ||
| memremap.c | ||
| module_signing.c | ||
| module-internal.h | ||
| module.c | ||
| notifier.c | ||
| nsproxy.c | ||
| padata.c | ||
| panic.c | ||
| params.c | ||
| pid_namespace.c | ||
| pid.c | ||
| profile.c | ||
| ptrace.c | ||
| range.c | ||
| reboot.c | ||
| relay.c | ||
| resource.c | ||
| rseq.c | ||
| seccomp.c | ||
| signal.c | ||
| smp.c | ||
| smpboot.c | ||
| smpboot.h | ||
| softirq.c | ||
| stackleak.c | ||
| stacktrace.c | ||
| stop_machine.c | ||
| sys_ni.c | ||
| sys.c | ||
| sysctl_binary.c | ||
| sysctl.c | ||
| task_work.c | ||
| taskstats.c | ||
| test_kprobes.c | ||
| torture.c | ||
| tracepoint.c | ||
| tsacct.c | ||
| ucount.c | ||
| uid16.c | ||
| uid16.h | ||
| umh.c | ||
| up.c | ||
| user_namespace.c | ||
| user-return-notifier.c | ||
| user.c | ||
| utsname_sysctl.c | ||
| utsname.c | ||
| watchdog_hld.c | ||
| watchdog.c | ||
| workqueue_internal.h | ||
| workqueue.c | ||