194 lines
7.4 KiB
ReStructuredText
194 lines
7.4 KiB
ReStructuredText
|
=============================
|
||
|
BPF Kernel Functions (kfuncs)
|
||
|
=============================
|
||
|
|
||
|
1. Introduction
|
||
|
===============
|
||
|
|
||
|
BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
|
||
|
kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
|
||
|
kfuncs do not have a stable interface and can change from one kernel release to
|
||
|
another. Hence, BPF programs need to be updated in response to changes in the
|
||
|
kernel.
|
||
|
|
||
|
2. Defining a kfunc
|
||
|
===================
|
||
|
|
||
|
There are two ways to expose a kernel function to BPF programs, either make an
|
||
|
existing function in the kernel visible, or add a new wrapper for BPF. In both
|
||
|
cases, care must be taken that BPF program can only call such function in a
|
||
|
valid context. To enforce this, visibility of a kfunc can be per program type.
|
||
|
|
||
|
If you are not creating a BPF wrapper for existing kernel function, skip ahead
|
||
|
to :ref:`BPF_kfunc_nodef`.
|
||
|
|
||
|
2.1 Creating a wrapper kfunc
|
||
|
----------------------------
|
||
|
|
||
|
When defining a wrapper kfunc, the wrapper function should have extern linkage.
|
||
|
This prevents the compiler from optimizing away dead code, as this wrapper kfunc
|
||
|
is not invoked anywhere in the kernel itself. It is not necessary to provide a
|
||
|
prototype in a header for the wrapper kfunc.
|
||
|
|
||
|
An example is given below::
|
||
|
|
||
|
/* Disables missing prototype warnings */
|
||
|
__diag_push();
|
||
|
__diag_ignore_all("-Wmissing-prototypes",
|
||
|
"Global kfuncs as their definitions will be in BTF");
|
||
|
|
||
|
struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
|
||
|
{
|
||
|
return find_get_task_by_vpid(nr);
|
||
|
}
|
||
|
|
||
|
__diag_pop();
|
||
|
|
||
|
A wrapper kfunc is often needed when we need to annotate parameters of the
|
||
|
kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
|
||
|
registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
|
||
|
|
||
|
2.2 Annotating kfunc parameters
|
||
|
-------------------------------
|
||
|
|
||
|
Similar to BPF helpers, there is sometime need for additional context required
|
||
|
by the verifier to make the usage of kernel functions safer and more useful.
|
||
|
Hence, we can annotate a parameter by suffixing the name of the argument of the
|
||
|
kfunc with a __tag, where tag may be one of the supported annotations.
|
||
|
|
||
|
2.2.1 __sz Annotation
|
||
|
---------------------
|
||
|
|
||
|
This annotation is used to indicate a memory and size pair in the argument list.
|
||
|
An example is given below::
|
||
|
|
||
|
void bpf_memzero(void *mem, int mem__sz)
|
||
|
{
|
||
|
...
|
||
|
}
|
||
|
|
||
|
Here, the verifier will treat first argument as a PTR_TO_MEM, and second
|
||
|
argument as its size. By default, without __sz annotation, the size of the type
|
||
|
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
|
||
|
pointer.
|
||
|
|
||
|
.. _BPF_kfunc_nodef:
|
||
|
|
||
|
2.3 Using an existing kernel function
|
||
|
-------------------------------------
|
||
|
|
||
|
When an existing function in the kernel is fit for consumption by BPF programs,
|
||
|
it can be directly registered with the BPF subsystem. However, care must still
|
||
|
be taken to review the context in which it will be invoked by the BPF program
|
||
|
and whether it is safe to do so.
|
||
|
|
||
|
2.4 Annotating kfuncs
|
||
|
---------------------
|
||
|
|
||
|
In addition to kfuncs' arguments, verifier may need more information about the
|
||
|
type of kfunc(s) being registered with the BPF subsystem. To do so, we define
|
||
|
flags on a set of kfuncs as follows::
|
||
|
|
||
|
BTF_SET8_START(bpf_task_set)
|
||
|
BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
|
||
|
BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
|
||
|
BTF_SET8_END(bpf_task_set)
|
||
|
|
||
|
This set encodes the BTF ID of each kfunc listed above, and encodes the flags
|
||
|
along with it. Ofcourse, it is also allowed to specify no flags.
|
||
|
|
||
|
2.4.1 KF_ACQUIRE flag
|
||
|
---------------------
|
||
|
|
||
|
The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
|
||
|
refcounted object. The verifier will then ensure that the pointer to the object
|
||
|
is eventually released using a release kfunc, or transferred to a map using a
|
||
|
referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
|
||
|
loading of the BPF program until no lingering references remain in all possible
|
||
|
explored states of the program.
|
||
|
|
||
|
2.4.2 KF_RET_NULL flag
|
||
|
----------------------
|
||
|
|
||
|
The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
|
||
|
may be NULL. Hence, it forces the user to do a NULL check on the pointer
|
||
|
returned from the kfunc before making use of it (dereferencing or passing to
|
||
|
another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
|
||
|
both are orthogonal to each other.
|
||
|
|
||
|
2.4.3 KF_RELEASE flag
|
||
|
---------------------
|
||
|
|
||
|
The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
|
||
|
passed in to it. There can be only one referenced pointer that can be passed in.
|
||
|
All copies of the pointer being released are invalidated as a result of invoking
|
||
|
kfunc with this flag.
|
||
|
|
||
|
2.4.4 KF_KPTR_GET flag
|
||
|
----------------------
|
||
|
|
||
|
The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
|
||
|
as a pointer to kptr, safely increments the refcount of the object it points to,
|
||
|
and returns a reference to the user. The rest of the arguments may be normal
|
||
|
arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
|
||
|
KF_ACQUIRE and KF_RET_NULL flags.
|
||
|
|
||
|
2.4.5 KF_TRUSTED_ARGS flag
|
||
|
--------------------------
|
||
|
|
||
|
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
|
||
|
indicates that the all pointer arguments will always have a guaranteed lifetime,
|
||
|
and pointers to kernel objects are always passed to helpers in their unmodified
|
||
|
form (as obtained from acquire kfuncs).
|
||
|
|
||
|
It can be used to enforce that a pointer to a refcounted object acquired from a
|
||
|
kfunc or BPF helper is passed as an argument to this kfunc without any
|
||
|
modifications (e.g. pointer arithmetic) such that it is trusted and points to
|
||
|
the original object.
|
||
|
|
||
|
Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
|
||
|
but those can have a non-zero offset.
|
||
|
|
||
|
This flag is often used for kfuncs that operate (change some property, perform
|
||
|
some operation) on an object that was obtained using an acquire kfunc. Such
|
||
|
kfuncs need an unchanged pointer to ensure the integrity of the operation being
|
||
|
performed on the expected object.
|
||
|
|
||
|
2.4.6 KF_SLEEPABLE flag
|
||
|
-----------------------
|
||
|
|
||
|
The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
|
||
|
be called by sleepable BPF programs (BPF_F_SLEEPABLE).
|
||
|
|
||
|
2.4.7 KF_DESTRUCTIVE flag
|
||
|
--------------------------
|
||
|
|
||
|
The KF_DESTRUCTIVE flag is used to indicate functions calling which is
|
||
|
destructive to the system. For example such a call can result in system
|
||
|
rebooting or panicking. Due to this additional restrictions apply to these
|
||
|
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
|
||
|
added later.
|
||
|
|
||
|
2.5 Registering the kfuncs
|
||
|
--------------------------
|
||
|
|
||
|
Once the kfunc is prepared for use, the final step to making it visible is
|
||
|
registering it with the BPF subsystem. Registration is done per BPF program
|
||
|
type. An example is shown below::
|
||
|
|
||
|
BTF_SET8_START(bpf_task_set)
|
||
|
BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
|
||
|
BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
|
||
|
BTF_SET8_END(bpf_task_set)
|
||
|
|
||
|
static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
|
||
|
.owner = THIS_MODULE,
|
||
|
.set = &bpf_task_set,
|
||
|
};
|
||
|
|
||
|
static int init_subsystem(void)
|
||
|
{
|
||
|
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
|
||
|
}
|
||
|
late_initcall(init_subsystem);
|