288 lines
12 KiB
ReStructuredText
288 lines
12 KiB
ReStructuredText
|
=======================================================
|
||
|
Hardware-assisted AddressSanitizer Design Documentation
|
||
|
=======================================================
|
||
|
|
||
|
This page is a design document for
|
||
|
**hardware-assisted AddressSanitizer** (or **HWASAN**)
|
||
|
a tool similar to :doc:`AddressSanitizer`,
|
||
|
but based on partial hardware assistance.
|
||
|
|
||
|
|
||
|
Introduction
|
||
|
============
|
||
|
|
||
|
:doc:`AddressSanitizer`
|
||
|
tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*),
|
||
|
uses *redzones* to find buffer-overflows and
|
||
|
*quarantine* to find use-after-free.
|
||
|
The redzones, the quarantine, and, to a less extent, the shadow, are the
|
||
|
sources of AddressSanitizer's memory overhead.
|
||
|
See the `AddressSanitizer paper`_ for details.
|
||
|
|
||
|
AArch64 has the `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows
|
||
|
software to use 8 most significant bits of a 64-bit pointer as
|
||
|
a tag. HWASAN uses `Address Tagging`_
|
||
|
to implement a memory safety tool, similar to :doc:`AddressSanitizer`,
|
||
|
but with smaller memory overhead and slightly different (mostly better)
|
||
|
accuracy guarantees.
|
||
|
|
||
|
Algorithm
|
||
|
=========
|
||
|
* Every heap/stack/global memory object is forcibly aligned by `TG` bytes
|
||
|
(`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**.
|
||
|
* For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8)
|
||
|
* The pointer to the object is tagged with `T`.
|
||
|
* The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory)
|
||
|
* Every load and store is instrumented to read the memory tag and compare it
|
||
|
with the pointer tag, exception is raised on tag mismatch.
|
||
|
|
||
|
For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf
|
||
|
|
||
|
Short granules
|
||
|
--------------
|
||
|
|
||
|
A short granule is a granule of size between 1 and `TG-1` bytes. The size
|
||
|
of a short granule is stored at the location in shadow memory where the
|
||
|
granule's tag is normally stored, while the granule's actual tag is stored
|
||
|
in the last byte of the granule. This means that in order to verify that a
|
||
|
pointer tag matches a memory tag, HWASAN must check for two possibilities:
|
||
|
|
||
|
* the pointer tag is equal to the memory tag in shadow memory, or
|
||
|
* the shadow memory tag is actually a short granule size, the value being loaded
|
||
|
is in bounds of the granule and the pointer tag is equal to the last byte of
|
||
|
the granule.
|
||
|
|
||
|
Pointer tags between 1 to `TG-1` are possible and are as likely as any other
|
||
|
tag. This means that these tags in memory have two interpretations: the full
|
||
|
tag interpretation (where the pointer tag is between 1 and `TG-1` and the
|
||
|
last byte of the granule is ordinary data) and the short tag interpretation
|
||
|
(where the pointer tag is stored in the granule).
|
||
|
|
||
|
When HWASAN detects an error near a memory tag between 1 and `TG-1`, it
|
||
|
will show both the memory tag and the last byte of the granule. Currently,
|
||
|
it is up to the user to disambiguate the two possibilities.
|
||
|
|
||
|
Instrumentation
|
||
|
===============
|
||
|
|
||
|
Memory Accesses
|
||
|
---------------
|
||
|
In the majority of cases, memory accesses are prefixed with a call to
|
||
|
an outlined instruction sequence that verifies the tags. The code size
|
||
|
and performance overhead of the call is reduced by using a custom calling
|
||
|
convention that
|
||
|
|
||
|
* preserves most registers, and
|
||
|
* is specialized to the register containing the address, and the type and
|
||
|
size of the memory access.
|
||
|
|
||
|
Currently, the following sequence is used:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
// int foo(int *a) { return *a; }
|
||
|
// clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - load.c
|
||
|
[...]
|
||
|
foo:
|
||
|
stp x30, x20, [sp, #-16]!
|
||
|
adrp x20, :got:__hwasan_shadow // load shadow address from GOT into x20
|
||
|
ldr x20, [x20, :got_lo12:__hwasan_shadow]
|
||
|
bl __hwasan_check_x0_2_short_v2 // call outlined tag check
|
||
|
// (arguments: x0 = address, x20 = shadow base;
|
||
|
// "2" encodes the access type and size)
|
||
|
ldr w0, [x0] // inline load
|
||
|
ldp x30, x20, [sp], #16
|
||
|
ret
|
||
|
|
||
|
[...]
|
||
|
__hwasan_check_x0_2_short_v2:
|
||
|
sbfx x16, x0, #4, #52 // shadow offset
|
||
|
ldrb w16, [x20, x16] // load shadow tag
|
||
|
cmp x16, x0, lsr #56 // extract address tag, compare with shadow tag
|
||
|
b.ne .Ltmp0 // jump to short tag handler on mismatch
|
||
|
.Ltmp1:
|
||
|
ret
|
||
|
.Ltmp0:
|
||
|
cmp w16, #15 // is this a short tag?
|
||
|
b.hi .Ltmp2 // if not, error
|
||
|
and x17, x0, #0xf // find the address's position in the short granule
|
||
|
add x17, x17, #3 // adjust to the position of the last byte loaded
|
||
|
cmp w16, w17 // check that position is in bounds
|
||
|
b.ls .Ltmp2 // if not, error
|
||
|
orr x16, x0, #0xf // compute address of last byte of granule
|
||
|
ldrb w16, [x16] // load tag from it
|
||
|
cmp x16, x0, lsr #56 // compare with pointer tag
|
||
|
b.eq .Ltmp1 // if matches, continue
|
||
|
.Ltmp2:
|
||
|
stp x0, x1, [sp, #-256]! // save original x0, x1 on stack (they will be overwritten)
|
||
|
stp x29, x30, [sp, #232] // create frame record
|
||
|
mov x1, #2 // set x1 to a constant indicating the type of failure
|
||
|
adrp x16, :got:__hwasan_tag_mismatch_v2 // call runtime function to save remaining registers and report error
|
||
|
ldr x16, [x16, :got_lo12:__hwasan_tag_mismatch_v2] // (load address from GOT to avoid potential register clobbers in delay load handler)
|
||
|
br x16
|
||
|
|
||
|
Heap
|
||
|
----
|
||
|
|
||
|
Tagging the heap memory/pointers is done by `malloc`.
|
||
|
This can be based on any malloc that forces all objects to be TG-aligned.
|
||
|
`free` tags the memory with a different tag.
|
||
|
|
||
|
Stack
|
||
|
-----
|
||
|
|
||
|
Stack frames are instrumented by aligning all non-promotable allocas
|
||
|
by `TG` and tagging stack memory in function prologue and epilogue.
|
||
|
|
||
|
Tags for different allocas in one function are **not** generated
|
||
|
independently; doing that in a function with `M` allocas would require
|
||
|
maintaining `M` live stack pointers, significantly increasing register
|
||
|
pressure. Instead we generate a single base tag value in the prologue,
|
||
|
and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where
|
||
|
ReTag can be as simple as exclusive-or with constant `M`.
|
||
|
|
||
|
Stack instrumentation is expected to be a major source of overhead,
|
||
|
but could be optional.
|
||
|
|
||
|
Globals
|
||
|
-------
|
||
|
|
||
|
Most globals in HWASAN instrumented code are tagged. This is accomplished
|
||
|
using the following mechanisms:
|
||
|
|
||
|
* The address of each global has a static tag associated with it. The first
|
||
|
defined global in a translation unit has a pseudorandom tag associated
|
||
|
with it, based on the hash of the file path. Subsequent global tags are
|
||
|
incremental from the previously-assigned tag.
|
||
|
|
||
|
* The global's tag is added to its symbol address in the object file's symbol
|
||
|
table. This causes the global's address to be tagged when its address is
|
||
|
taken.
|
||
|
|
||
|
* When the address of a global is taken directly (i.e. not via the GOT), a special
|
||
|
instruction sequence needs to be used to add the tag to the address,
|
||
|
because the tag would otherwise take the address outside of the small code
|
||
|
model (4GB on AArch64). No changes are required when the address is taken
|
||
|
via the GOT because the address stored in the GOT will contain the tag.
|
||
|
|
||
|
* An associated ``hwasan_globals`` section is emitted for each tagged global,
|
||
|
which indicates the address of the global, its size and its tag. These
|
||
|
sections are concatenated by the linker into a single ``hwasan_globals``
|
||
|
section that is enumerated by the runtime (via an ELF note) when a binary
|
||
|
is loaded and the memory is tagged accordingly.
|
||
|
|
||
|
A complete example is given below:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
// int x = 1; int *f() { return &x; }
|
||
|
// clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - global.c
|
||
|
|
||
|
[...]
|
||
|
f:
|
||
|
adrp x0, :pg_hi21_nc:x // set bits 12-63 to upper bits of untagged address
|
||
|
movk x0, #:prel_g3:x+0x100000000 // set bits 48-63 to tag
|
||
|
add x0, x0, :lo12:x // set bits 0-11 to lower bits of address
|
||
|
ret
|
||
|
|
||
|
[...]
|
||
|
.data
|
||
|
.Lx.hwasan:
|
||
|
.word 1
|
||
|
|
||
|
.globl x
|
||
|
.set x, .Lx.hwasan+0x2d00000000000000
|
||
|
|
||
|
[...]
|
||
|
.section .note.hwasan.globals,"aG",@note,hwasan.module_ctor,comdat
|
||
|
.Lhwasan.note:
|
||
|
.word 8 // namesz
|
||
|
.word 8 // descsz
|
||
|
.word 3 // NT_LLVM_HWASAN_GLOBALS
|
||
|
.asciz "LLVM\000\000\000"
|
||
|
.word __start_hwasan_globals-.Lhwasan.note
|
||
|
.word __stop_hwasan_globals-.Lhwasan.note
|
||
|
|
||
|
[...]
|
||
|
.section hwasan_globals,"ao",@progbits,.Lx.hwasan,unique,2
|
||
|
.Lx.hwasan.descriptor:
|
||
|
.word .Lx.hwasan-.Lx.hwasan.descriptor
|
||
|
.word 0x2d000004 // tag = 0x2d, size = 4
|
||
|
|
||
|
Error reporting
|
||
|
---------------
|
||
|
|
||
|
Errors are generated by the `HLT` instruction and are handled by a signal handler.
|
||
|
|
||
|
Attribute
|
||
|
---------
|
||
|
|
||
|
HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching
|
||
|
C function attribute. An alternative would be to re-use ASAN's attribute
|
||
|
`sanitize_address`. The reasons to use a separate attribute are:
|
||
|
|
||
|
* Users may need to disable ASAN but not HWASAN, or vise versa,
|
||
|
because the tools have different trade-offs and compatibility issues.
|
||
|
* LLVM (ideally) does not use flags to decide which pass is being used,
|
||
|
ASAN or HWASAN are being applied, based on the function attributes.
|
||
|
|
||
|
This does mean that users of HWASAN may need to add the new attribute
|
||
|
to the code that already uses the old attribute.
|
||
|
|
||
|
|
||
|
Comparison with AddressSanitizer
|
||
|
================================
|
||
|
|
||
|
HWASAN:
|
||
|
* Is less portable than :doc:`AddressSanitizer`
|
||
|
as it relies on hardware `Address Tagging`_ (AArch64).
|
||
|
Address Tagging can be emulated with compiler instrumentation,
|
||
|
but it will require the instrumentation to remove the tags before
|
||
|
any load or store, which is infeasible in any realistic environment
|
||
|
that contains non-instrumented code.
|
||
|
* May have compatibility problems if the target code uses higher
|
||
|
pointer bits for other purposes.
|
||
|
* May require changes in the OS kernels (e.g. Linux seems to dislike
|
||
|
tagged pointers passed from address space:
|
||
|
https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt).
|
||
|
* **Does not require redzones to detect buffer overflows**,
|
||
|
but the buffer overflow detection is probabilistic, with roughly
|
||
|
`1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS
|
||
|
respectively).
|
||
|
* **Does not require quarantine to detect heap-use-after-free,
|
||
|
or stack-use-after-return**.
|
||
|
The detection is similarly probabilistic.
|
||
|
|
||
|
The memory overhead of HWASAN is expected to be much smaller
|
||
|
than that of AddressSanitizer:
|
||
|
`1/TG` extra memory for the shadow
|
||
|
and some overhead due to `TG`-aligning all objects.
|
||
|
|
||
|
Supported architectures
|
||
|
=======================
|
||
|
HWASAN relies on `Address Tagging`_ which is only available on AArch64.
|
||
|
For other 64-bit architectures it is possible to remove the address tags
|
||
|
before every load and store by compiler instrumentation, but this variant
|
||
|
will have limited deployability since not all of the code is
|
||
|
typically instrumented.
|
||
|
|
||
|
The HWASAN's approach is not applicable to 32-bit architectures.
|
||
|
|
||
|
|
||
|
Related Work
|
||
|
============
|
||
|
* `SPARC ADI`_ implements a similar tool mostly in hardware.
|
||
|
* `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses
|
||
|
similar approaches ("lock & key").
|
||
|
* `Watchdog`_ discussed a heavier, but still somewhat similar
|
||
|
"lock & key" approach.
|
||
|
* *TODO: add more "related work" links. Suggestions are welcome.*
|
||
|
|
||
|
|
||
|
.. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf
|
||
|
.. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf
|
||
|
.. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html
|
||
|
.. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf
|
||
|
.. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html
|
||
|
|