530 lines
19 KiB
ReStructuredText
530 lines
19 KiB
ReStructuredText
|
|
||
|
=============
|
||
|
eBPF verifier
|
||
|
=============
|
||
|
|
||
|
The safety of the eBPF program is determined in two steps.
|
||
|
|
||
|
First step does DAG check to disallow loops and other CFG validation.
|
||
|
In particular it will detect programs that have unreachable instructions.
|
||
|
(though classic BPF checker allows them)
|
||
|
|
||
|
Second step starts from the first insn and descends all possible paths.
|
||
|
It simulates execution of every insn and observes the state change of
|
||
|
registers and stack.
|
||
|
|
||
|
At the start of the program the register R1 contains a pointer to context
|
||
|
and has type PTR_TO_CTX.
|
||
|
If verifier sees an insn that does R2=R1, then R2 has now type
|
||
|
PTR_TO_CTX as well and can be used on the right hand side of expression.
|
||
|
If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE,
|
||
|
since addition of two valid pointers makes invalid pointer.
|
||
|
(In 'secure' mode verifier will reject any type of pointer arithmetic to make
|
||
|
sure that kernel addresses don't leak to unprivileged users)
|
||
|
|
||
|
If register was never written to, it's not readable::
|
||
|
|
||
|
bpf_mov R0 = R2
|
||
|
bpf_exit
|
||
|
|
||
|
will be rejected, since R2 is unreadable at the start of the program.
|
||
|
|
||
|
After kernel function call, R1-R5 are reset to unreadable and
|
||
|
R0 has a return type of the function.
|
||
|
|
||
|
Since R6-R9 are callee saved, their state is preserved across the call.
|
||
|
|
||
|
::
|
||
|
|
||
|
bpf_mov R6 = 1
|
||
|
bpf_call foo
|
||
|
bpf_mov R0 = R6
|
||
|
bpf_exit
|
||
|
|
||
|
is a correct program. If there was R1 instead of R6, it would have
|
||
|
been rejected.
|
||
|
|
||
|
load/store instructions are allowed only with registers of valid types, which
|
||
|
are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked.
|
||
|
For example::
|
||
|
|
||
|
bpf_mov R1 = 1
|
||
|
bpf_mov R2 = 2
|
||
|
bpf_xadd *(u32 *)(R1 + 3) += R2
|
||
|
bpf_exit
|
||
|
|
||
|
will be rejected, since R1 doesn't have a valid pointer type at the time of
|
||
|
execution of instruction bpf_xadd.
|
||
|
|
||
|
At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``)
|
||
|
A callback is used to customize verifier to restrict eBPF program access to only
|
||
|
certain fields within ctx structure with specified size and alignment.
|
||
|
|
||
|
For example, the following insn::
|
||
|
|
||
|
bpf_ld R0 = *(u32 *)(R6 + 8)
|
||
|
|
||
|
intends to load a word from address R6 + 8 and store it into R0
|
||
|
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
|
||
|
that offset 8 of size 4 bytes can be accessed for reading, otherwise
|
||
|
the verifier will reject the program.
|
||
|
If R6=PTR_TO_STACK, then access should be aligned and be within
|
||
|
stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
|
||
|
so it will fail verification, since it's out of bounds.
|
||
|
|
||
|
The verifier will allow eBPF program to read data from stack only after
|
||
|
it wrote into it.
|
||
|
|
||
|
Classic BPF verifier does similar check with M[0-15] memory slots.
|
||
|
For example::
|
||
|
|
||
|
bpf_ld R0 = *(u32 *)(R10 - 4)
|
||
|
bpf_exit
|
||
|
|
||
|
is invalid program.
|
||
|
Though R10 is correct read-only register and has type PTR_TO_STACK
|
||
|
and R10 - 4 is within stack bounds, there were no stores into that location.
|
||
|
|
||
|
Pointer register spill/fill is tracked as well, since four (R6-R9)
|
||
|
callee saved registers may not be enough for some programs.
|
||
|
|
||
|
Allowed function calls are customized with bpf_verifier_ops->get_func_proto()
|
||
|
The eBPF verifier will check that registers match argument constraints.
|
||
|
After the call register R0 will be set to return type of the function.
|
||
|
|
||
|
Function calls is a main mechanism to extend functionality of eBPF programs.
|
||
|
Socket filters may let programs to call one set of functions, whereas tracing
|
||
|
filters may allow completely different set.
|
||
|
|
||
|
If a function made accessible to eBPF program, it needs to be thought through
|
||
|
from safety point of view. The verifier will guarantee that the function is
|
||
|
called with valid arguments.
|
||
|
|
||
|
seccomp vs socket filters have different security restrictions for classic BPF.
|
||
|
Seccomp solves this by two stage verifier: classic BPF verifier is followed
|
||
|
by seccomp verifier. In case of eBPF one configurable verifier is shared for
|
||
|
all use cases.
|
||
|
|
||
|
See details of eBPF verifier in kernel/bpf/verifier.c
|
||
|
|
||
|
Register value tracking
|
||
|
=======================
|
||
|
|
||
|
In order to determine the safety of an eBPF program, the verifier must track
|
||
|
the range of possible values in each register and also in each stack slot.
|
||
|
This is done with ``struct bpf_reg_state``, defined in include/linux/
|
||
|
bpf_verifier.h, which unifies tracking of scalar and pointer values. Each
|
||
|
register state has a type, which is either NOT_INIT (the register has not been
|
||
|
written to), SCALAR_VALUE (some value which is not usable as a pointer), or a
|
||
|
pointer type. The types of pointers describe their base, as follows:
|
||
|
|
||
|
|
||
|
PTR_TO_CTX
|
||
|
Pointer to bpf_context.
|
||
|
CONST_PTR_TO_MAP
|
||
|
Pointer to struct bpf_map. "Const" because arithmetic
|
||
|
on these pointers is forbidden.
|
||
|
PTR_TO_MAP_VALUE
|
||
|
Pointer to the value stored in a map element.
|
||
|
PTR_TO_MAP_VALUE_OR_NULL
|
||
|
Either a pointer to a map value, or NULL; map accesses
|
||
|
(see maps.rst) return this type, which becomes a
|
||
|
PTR_TO_MAP_VALUE when checked != NULL. Arithmetic on
|
||
|
these pointers is forbidden.
|
||
|
PTR_TO_STACK
|
||
|
Frame pointer.
|
||
|
PTR_TO_PACKET
|
||
|
skb->data.
|
||
|
PTR_TO_PACKET_END
|
||
|
skb->data + headlen; arithmetic forbidden.
|
||
|
PTR_TO_SOCKET
|
||
|
Pointer to struct bpf_sock_ops, implicitly refcounted.
|
||
|
PTR_TO_SOCKET_OR_NULL
|
||
|
Either a pointer to a socket, or NULL; socket lookup
|
||
|
returns this type, which becomes a PTR_TO_SOCKET when
|
||
|
checked != NULL. PTR_TO_SOCKET is reference-counted,
|
||
|
so programs must release the reference through the
|
||
|
socket release function before the end of the program.
|
||
|
Arithmetic on these pointers is forbidden.
|
||
|
|
||
|
However, a pointer may be offset from this base (as a result of pointer
|
||
|
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
|
||
|
offset'. The former is used when an exactly-known value (e.g. an immediate
|
||
|
operand) is added to a pointer, while the latter is used for values which are
|
||
|
not exactly known. The variable offset is also used in SCALAR_VALUEs, to track
|
||
|
the range of possible values in the register.
|
||
|
|
||
|
The verifier's knowledge about the variable offset consists of:
|
||
|
|
||
|
* minimum and maximum values as unsigned
|
||
|
* minimum and maximum values as signed
|
||
|
|
||
|
* knowledge of the values of individual bits, in the form of a 'tnum': a u64
|
||
|
'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
|
||
|
1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
|
||
|
mask and value; no bit should ever be 1 in both. For example, if a byte is read
|
||
|
into a register from memory, the register's top 56 bits are known zero, while
|
||
|
the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
|
||
|
then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
|
||
|
0x1ff), because of potential carries.
|
||
|
|
||
|
Besides arithmetic, the register state can also be updated by conditional
|
||
|
branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
|
||
|
it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false'
|
||
|
branch it will have a umax_value of 8. A signed compare (with BPF_JSGT or
|
||
|
BPF_JSGE) would instead update the signed minimum/maximum values. Information
|
||
|
from the signed and unsigned bounds can be combined; for instance if a value is
|
||
|
first tested < 8 and then tested s> 4, the verifier will conclude that the value
|
||
|
is also > 4 and s< 8, since the bounds prevent crossing the sign boundary.
|
||
|
|
||
|
PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all
|
||
|
pointers sharing that same variable offset. This is important for packet range
|
||
|
checks: after adding a variable to a packet pointer register A, if you then copy
|
||
|
it to another register B and then add a constant 4 to A, both registers will
|
||
|
share the same 'id' but the A will have a fixed offset of +4. Then if A is
|
||
|
bounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is
|
||
|
now known to have a safe range of at least 4 bytes. See 'Direct packet access',
|
||
|
below, for more on PTR_TO_PACKET ranges.
|
||
|
|
||
|
The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of
|
||
|
the pointer returned from a map lookup. This means that when one copy is
|
||
|
checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs.
|
||
|
As well as range-checking, the tracked information is also used for enforcing
|
||
|
alignment of pointer accesses. For instance, on most systems the packet pointer
|
||
|
is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump
|
||
|
over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
|
||
|
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
|
||
|
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
|
||
|
that pointer are safe.
|
||
|
The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
|
||
|
to all copies of the pointer returned from a socket lookup. This has similar
|
||
|
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
|
||
|
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
|
||
|
represents a reference to the corresponding ``struct sock``. To ensure that the
|
||
|
reference is not leaked, it is imperative to NULL-check the reference and in
|
||
|
the non-NULL case, and pass the valid reference to the socket release function.
|
||
|
|
||
|
Direct packet access
|
||
|
====================
|
||
|
|
||
|
In cls_bpf and act_bpf programs the verifier allows direct access to the packet
|
||
|
data via skb->data and skb->data_end pointers.
|
||
|
Ex::
|
||
|
|
||
|
1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
|
||
|
2: r3 = *(u32 *)(r1 +76) /* load skb->data */
|
||
|
3: r5 = r3
|
||
|
4: r5 += 14
|
||
|
5: if r5 > r4 goto pc+16
|
||
|
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
|
||
|
6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
|
||
|
|
||
|
this 2byte load from the packet is safe to do, since the program author
|
||
|
did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which
|
||
|
means that in the fall-through case the register R3 (which points to skb->data)
|
||
|
has at least 14 directly accessible bytes. The verifier marks it
|
||
|
as R3=pkt(id=0,off=0,r=14).
|
||
|
id=0 means that no additional variables were added to the register.
|
||
|
off=0 means that no additional constants were added.
|
||
|
r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
|
||
|
Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
|
||
|
to the packet data, but constant 14 was added to the register, so
|
||
|
it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14)
|
||
|
which is zero bytes.
|
||
|
|
||
|
More complex packet access may look like::
|
||
|
|
||
|
|
||
|
R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
|
||
|
6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
|
||
|
7: r4 = *(u8 *)(r3 +12)
|
||
|
8: r4 *= 14
|
||
|
9: r3 = *(u32 *)(r1 +76) /* load skb->data */
|
||
|
10: r3 += r4
|
||
|
11: r2 = r1
|
||
|
12: r2 <<= 48
|
||
|
13: r2 >>= 48
|
||
|
14: r3 += r2
|
||
|
15: r2 = r3
|
||
|
16: r2 += 8
|
||
|
17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
|
||
|
18: if r2 > r1 goto pc+2
|
||
|
R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
|
||
|
19: r1 = *(u8 *)(r3 +4)
|
||
|
|
||
|
The state of the register R3 is R3=pkt(id=2,off=0,r=8)
|
||
|
id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some
|
||
|
offset within a packet and since the program author did
|
||
|
``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8).
|
||
|
The verifier only allows 'add'/'sub' operations on packet registers. Any other
|
||
|
operation will set the register state to 'SCALAR_VALUE' and it won't be
|
||
|
available for direct packet access.
|
||
|
|
||
|
Operation ``r3 += rX`` may overflow and become less than original skb->data,
|
||
|
therefore the verifier has to prevent that. So when it sees ``r3 += rX``
|
||
|
instruction and rX is more than 16-bit value, any subsequent bounds-check of r3
|
||
|
against skb->data_end will not give us 'range' information, so attempts to read
|
||
|
through the pointer will give "invalid access to packet" error.
|
||
|
|
||
|
Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is
|
||
|
R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits
|
||
|
of the register are guaranteed to be zero, and nothing is known about the lower
|
||
|
8 bits. After insn ``r4 *= 14`` the state becomes
|
||
|
R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit
|
||
|
value by constant 14 will keep upper 52 bits as zero, also the least significant
|
||
|
bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make
|
||
|
R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign
|
||
|
extending. This logic is implemented in adjust_reg_min_max_vals() function,
|
||
|
which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice
|
||
|
versa) and adjust_scalar_min_max_vals() for operations on two scalars.
|
||
|
|
||
|
The end result is that bpf program author can access packet directly
|
||
|
using normal C code as::
|
||
|
|
||
|
void *data = (void *)(long)skb->data;
|
||
|
void *data_end = (void *)(long)skb->data_end;
|
||
|
struct eth_hdr *eth = data;
|
||
|
struct iphdr *iph = data + sizeof(*eth);
|
||
|
struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
|
||
|
|
||
|
if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
|
||
|
return 0;
|
||
|
if (eth->h_proto != htons(ETH_P_IP))
|
||
|
return 0;
|
||
|
if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
|
||
|
return 0;
|
||
|
if (udp->dest == 53 || udp->source == 9)
|
||
|
...;
|
||
|
|
||
|
which makes such programs easier to write comparing to LD_ABS insn
|
||
|
and significantly faster.
|
||
|
|
||
|
Pruning
|
||
|
=======
|
||
|
|
||
|
The verifier does not actually walk all possible paths through the program. For
|
||
|
each new branch to analyse, the verifier looks at all the states it's previously
|
||
|
been in when at this instruction. If any of them contain the current state as a
|
||
|
subset, the branch is 'pruned' - that is, the fact that the previous state was
|
||
|
accepted implies the current state would be as well. For instance, if in the
|
||
|
previous state, r1 held a packet-pointer, and in the current state, r1 holds a
|
||
|
packet-pointer with a range as long or longer and at least as strict an
|
||
|
alignment, then r1 is safe. Similarly, if r2 was NOT_INIT before then it can't
|
||
|
have been used by any path from that point, so any value in r2 (including
|
||
|
another NOT_INIT) is safe. The implementation is in the function regsafe().
|
||
|
Pruning considers not only the registers but also the stack (and any spilled
|
||
|
registers it may hold). They must all be safe for the branch to be pruned.
|
||
|
This is implemented in states_equal().
|
||
|
|
||
|
Understanding eBPF verifier messages
|
||
|
====================================
|
||
|
|
||
|
The following are few examples of invalid eBPF programs and verifier error
|
||
|
messages as seen in the log:
|
||
|
|
||
|
Program with unreachable instructions::
|
||
|
|
||
|
static struct bpf_insn prog[] = {
|
||
|
BPF_EXIT_INSN(),
|
||
|
BPF_EXIT_INSN(),
|
||
|
};
|
||
|
|
||
|
Error::
|
||
|
|
||
|
unreachable insn 1
|
||
|
|
||
|
Program that reads uninitialized register::
|
||
|
|
||
|
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (bf) r0 = r2
|
||
|
R2 !read_ok
|
||
|
|
||
|
Program that doesn't initialize R0 before exiting::
|
||
|
|
||
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (bf) r2 = r1
|
||
|
1: (95) exit
|
||
|
R0 !read_ok
|
||
|
|
||
|
Program that accesses stack out of bounds::
|
||
|
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (7a) *(u64 *)(r10 +8) = 0
|
||
|
invalid stack off=8 size=8
|
||
|
|
||
|
Program that doesn't initialize stack before passing its address into function::
|
||
|
|
||
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||
|
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (bf) r2 = r10
|
||
|
1: (07) r2 += -8
|
||
|
2: (b7) r1 = 0x0
|
||
|
3: (85) call 1
|
||
|
invalid indirect read from stack off -8+0 size 8
|
||
|
|
||
|
Program that uses invalid map_fd=0 while calling to map_lookup_elem() function::
|
||
|
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
||
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||
|
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (7a) *(u64 *)(r10 -8) = 0
|
||
|
1: (bf) r2 = r10
|
||
|
2: (07) r2 += -8
|
||
|
3: (b7) r1 = 0x0
|
||
|
4: (85) call 1
|
||
|
fd 0 is not pointing to valid bpf_map
|
||
|
|
||
|
Program that doesn't check return value of map_lookup_elem() before accessing
|
||
|
map element::
|
||
|
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
||
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||
|
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (7a) *(u64 *)(r10 -8) = 0
|
||
|
1: (bf) r2 = r10
|
||
|
2: (07) r2 += -8
|
||
|
3: (b7) r1 = 0x0
|
||
|
4: (85) call 1
|
||
|
5: (7a) *(u64 *)(r0 +0) = 0
|
||
|
R0 invalid mem access 'map_value_or_null'
|
||
|
|
||
|
Program that correctly checks map_lookup_elem() returned value for NULL, but
|
||
|
accesses the memory with incorrect alignment::
|
||
|
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
||
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||
|
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||
|
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (7a) *(u64 *)(r10 -8) = 0
|
||
|
1: (bf) r2 = r10
|
||
|
2: (07) r2 += -8
|
||
|
3: (b7) r1 = 1
|
||
|
4: (85) call 1
|
||
|
5: (15) if r0 == 0x0 goto pc+1
|
||
|
R0=map_ptr R10=fp
|
||
|
6: (7a) *(u64 *)(r0 +4) = 0
|
||
|
misaligned access off 4 size 8
|
||
|
|
||
|
Program that correctly checks map_lookup_elem() returned value for NULL and
|
||
|
accesses memory with correct alignment in one side of 'if' branch, but fails
|
||
|
to do so in the other side of 'if' branch::
|
||
|
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
||
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||
|
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||
|
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||
|
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
|
||
|
BPF_EXIT_INSN(),
|
||
|
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (7a) *(u64 *)(r10 -8) = 0
|
||
|
1: (bf) r2 = r10
|
||
|
2: (07) r2 += -8
|
||
|
3: (b7) r1 = 1
|
||
|
4: (85) call 1
|
||
|
5: (15) if r0 == 0x0 goto pc+2
|
||
|
R0=map_ptr R10=fp
|
||
|
6: (7a) *(u64 *)(r0 +0) = 0
|
||
|
7: (95) exit
|
||
|
|
||
|
from 5 to 8: R0=imm0 R10=fp
|
||
|
8: (7a) *(u64 *)(r0 +0) = 1
|
||
|
R0 invalid mem access 'imm'
|
||
|
|
||
|
Program that performs a socket lookup then sets the pointer to NULL without
|
||
|
checking it::
|
||
|
|
||
|
BPF_MOV64_IMM(BPF_REG_2, 0),
|
||
|
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
|
||
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||
|
BPF_MOV64_IMM(BPF_REG_3, 4),
|
||
|
BPF_MOV64_IMM(BPF_REG_4, 0),
|
||
|
BPF_MOV64_IMM(BPF_REG_5, 0),
|
||
|
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
|
||
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (b7) r2 = 0
|
||
|
1: (63) *(u32 *)(r10 -8) = r2
|
||
|
2: (bf) r2 = r10
|
||
|
3: (07) r2 += -8
|
||
|
4: (b7) r3 = 4
|
||
|
5: (b7) r4 = 0
|
||
|
6: (b7) r5 = 0
|
||
|
7: (85) call bpf_sk_lookup_tcp#65
|
||
|
8: (b7) r0 = 0
|
||
|
9: (95) exit
|
||
|
Unreleased reference id=1, alloc_insn=7
|
||
|
|
||
|
Program that performs a socket lookup but does not NULL-check the returned
|
||
|
value::
|
||
|
|
||
|
BPF_MOV64_IMM(BPF_REG_2, 0),
|
||
|
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
|
||
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||
|
BPF_MOV64_IMM(BPF_REG_3, 4),
|
||
|
BPF_MOV64_IMM(BPF_REG_4, 0),
|
||
|
BPF_MOV64_IMM(BPF_REG_5, 0),
|
||
|
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
|
||
|
BPF_EXIT_INSN(),
|
||
|
|
||
|
Error::
|
||
|
|
||
|
0: (b7) r2 = 0
|
||
|
1: (63) *(u32 *)(r10 -8) = r2
|
||
|
2: (bf) r2 = r10
|
||
|
3: (07) r2 += -8
|
||
|
4: (b7) r3 = 4
|
||
|
5: (b7) r4 = 0
|
||
|
6: (b7) r5 = 0
|
||
|
7: (85) call bpf_sk_lookup_tcp#65
|
||
|
8: (95) exit
|
||
|
Unreleased reference id=1, alloc_insn=7
|