223 lines
9.4 KiB
ReStructuredText
223 lines
9.4 KiB
ReStructuredText
|
Assembler Annotations
|
||
|
=====================
|
||
|
|
||
|
Copyright (c) 2017-2019 Jiri Slaby
|
||
|
|
||
|
This document describes the new macros for annotation of data and code in
|
||
|
assembly. In particular, it contains information about ``SYM_FUNC_START``,
|
||
|
``SYM_FUNC_END``, ``SYM_CODE_START``, and similar.
|
||
|
|
||
|
Rationale
|
||
|
---------
|
||
|
Some code like entries, trampolines, or boot code needs to be written in
|
||
|
assembly. The same as in C, such code is grouped into functions and
|
||
|
accompanied with data. Standard assemblers do not force users into precisely
|
||
|
marking these pieces as code, data, or even specifying their length.
|
||
|
Nevertheless, assemblers provide developers with such annotations to aid
|
||
|
debuggers throughout assembly. On top of that, developers also want to mark
|
||
|
some functions as *global* in order to be visible outside of their translation
|
||
|
units.
|
||
|
|
||
|
Over time, the Linux kernel has adopted macros from various projects (like
|
||
|
``binutils``) to facilitate such annotations. So for historic reasons,
|
||
|
developers have been using ``ENTRY``, ``END``, ``ENDPROC``, and other
|
||
|
annotations in assembly. Due to the lack of their documentation, the macros
|
||
|
are used in rather wrong contexts at some locations. Clearly, ``ENTRY`` was
|
||
|
intended to denote the beginning of global symbols (be it data or code).
|
||
|
``END`` used to mark the end of data or end of special functions with
|
||
|
*non-standard* calling convention. In contrast, ``ENDPROC`` should annotate
|
||
|
only ends of *standard* functions.
|
||
|
|
||
|
When these macros are used correctly, they help assemblers generate a nice
|
||
|
object with both sizes and types set correctly. For example, the result of
|
||
|
``arch/x86/lib/putuser.S``::
|
||
|
|
||
|
Num: Value Size Type Bind Vis Ndx Name
|
||
|
25: 0000000000000000 33 FUNC GLOBAL DEFAULT 1 __put_user_1
|
||
|
29: 0000000000000030 37 FUNC GLOBAL DEFAULT 1 __put_user_2
|
||
|
32: 0000000000000060 36 FUNC GLOBAL DEFAULT 1 __put_user_4
|
||
|
35: 0000000000000090 37 FUNC GLOBAL DEFAULT 1 __put_user_8
|
||
|
|
||
|
This is not only important for debugging purposes. When there are properly
|
||
|
annotated objects like this, tools can be run on them to generate more useful
|
||
|
information. In particular, on properly annotated objects, ``objtool`` can be
|
||
|
run to check and fix the object if needed. Currently, ``objtool`` can report
|
||
|
missing frame pointer setup/destruction in functions. It can also
|
||
|
automatically generate annotations for the ORC unwinder
|
||
|
(Documentation/x86/orc-unwinder.rst)
|
||
|
for most code. Both of these are especially important to support reliable
|
||
|
stack traces which are in turn necessary for kernel live patching
|
||
|
(Documentation/livepatch/livepatch.rst).
|
||
|
|
||
|
Caveat and Discussion
|
||
|
---------------------
|
||
|
As one might realize, there were only three macros previously. That is indeed
|
||
|
insufficient to cover all the combinations of cases:
|
||
|
|
||
|
* standard/non-standard function
|
||
|
* code/data
|
||
|
* global/local symbol
|
||
|
|
||
|
There was a discussion_ and instead of extending the current ``ENTRY/END*``
|
||
|
macros, it was decided that brand new macros should be introduced instead::
|
||
|
|
||
|
So how about using macro names that actually show the purpose, instead
|
||
|
of importing all the crappy, historic, essentially randomly chosen
|
||
|
debug symbol macro names from the binutils and older kernels?
|
||
|
|
||
|
.. _discussion: https://lore.kernel.org/r/20170217104757.28588-1-jslaby@suse.cz
|
||
|
|
||
|
Macros Description
|
||
|
------------------
|
||
|
|
||
|
The new macros are prefixed with the ``SYM_`` prefix and can be divided into
|
||
|
three main groups:
|
||
|
|
||
|
1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with
|
||
|
standard C calling conventions. For example, on x86, this means that the
|
||
|
stack contains a return address at the predefined place and a return from
|
||
|
the function can happen in a standard way. When frame pointers are enabled,
|
||
|
save/restore of frame pointer shall happen at the start/end of a function,
|
||
|
respectively, too.
|
||
|
|
||
|
Checking tools like ``objtool`` should ensure such marked functions conform
|
||
|
to these rules. The tools can also easily annotate these functions with
|
||
|
debugging information (like *ORC data*) automatically.
|
||
|
|
||
|
2. ``SYM_CODE_*`` -- special functions called with special stack. Be it
|
||
|
interrupt handlers with special stack content, trampolines, or startup
|
||
|
functions.
|
||
|
|
||
|
Checking tools mostly ignore checking of these functions. But some debug
|
||
|
information still can be generated automatically. For correct debug data,
|
||
|
this code needs hints like ``UNWIND_HINT_REGS`` provided by developers.
|
||
|
|
||
|
3. ``SYM_DATA*`` -- obviously data belonging to ``.data`` sections and not to
|
||
|
``.text``. Data do not contain instructions, so they have to be treated
|
||
|
specially by the tools: they should not treat the bytes as instructions,
|
||
|
nor assign any debug information to them.
|
||
|
|
||
|
Instruction Macros
|
||
|
~~~~~~~~~~~~~~~~~~
|
||
|
This section covers ``SYM_FUNC_*`` and ``SYM_CODE_*`` enumerated above.
|
||
|
|
||
|
``objtool`` requires that all code must be contained in an ELF symbol. Symbol
|
||
|
names that have a ``.L`` prefix do not emit symbol table entries. ``.L``
|
||
|
prefixed symbols can be used within a code region, but should be avoided for
|
||
|
denoting a range of code via ``SYM_*_START/END`` annotations.
|
||
|
|
||
|
* ``SYM_FUNC_START`` and ``SYM_FUNC_START_LOCAL`` are supposed to be **the
|
||
|
most frequent markings**. They are used for functions with standard calling
|
||
|
conventions -- global and local. Like in C, they both align the functions to
|
||
|
architecture specific ``__ALIGN`` bytes. There are also ``_NOALIGN`` variants
|
||
|
for special cases where developers do not want this implicit alignment.
|
||
|
|
||
|
``SYM_FUNC_START_WEAK`` and ``SYM_FUNC_START_WEAK_NOALIGN`` markings are
|
||
|
also offered as an assembler counterpart to the *weak* attribute known from
|
||
|
C.
|
||
|
|
||
|
All of these **shall** be coupled with ``SYM_FUNC_END``. First, it marks
|
||
|
the sequence of instructions as a function and computes its size to the
|
||
|
generated object file. Second, it also eases checking and processing such
|
||
|
object files as the tools can trivially find exact function boundaries.
|
||
|
|
||
|
So in most cases, developers should write something like in the following
|
||
|
example, having some asm instructions in between the macros, of course::
|
||
|
|
||
|
SYM_FUNC_START(memset)
|
||
|
... asm insns ...
|
||
|
SYM_FUNC_END(memset)
|
||
|
|
||
|
In fact, this kind of annotation corresponds to the now deprecated ``ENTRY``
|
||
|
and ``ENDPROC`` macros.
|
||
|
|
||
|
* ``SYM_FUNC_ALIAS``, ``SYM_FUNC_ALIAS_LOCAL``, and ``SYM_FUNC_ALIAS_WEAK`` can
|
||
|
be used to define multiple names for a function. The typical use is::
|
||
|
|
||
|
SYM_FUNC_START(__memset)
|
||
|
... asm insns ...
|
||
|
SYN_FUNC_END(__memset)
|
||
|
SYM_FUNC_ALIAS(memset, __memset)
|
||
|
|
||
|
In this example, one can call ``__memset`` or ``memset`` with the same
|
||
|
result, except the debug information for the instructions is generated to
|
||
|
the object file only once -- for the non-``ALIAS`` case.
|
||
|
|
||
|
* ``SYM_CODE_START`` and ``SYM_CODE_START_LOCAL`` should be used only in
|
||
|
special cases -- if you know what you are doing. This is used exclusively
|
||
|
for interrupt handlers and similar where the calling convention is not the C
|
||
|
one. ``_NOALIGN`` variants exist too. The use is the same as for the ``FUNC``
|
||
|
category above::
|
||
|
|
||
|
SYM_CODE_START_LOCAL(bad_put_user)
|
||
|
... asm insns ...
|
||
|
SYM_CODE_END(bad_put_user)
|
||
|
|
||
|
Again, every ``SYM_CODE_START*`` **shall** be coupled by ``SYM_CODE_END``.
|
||
|
|
||
|
To some extent, this category corresponds to deprecated ``ENTRY`` and
|
||
|
``END``. Except ``END`` had several other meanings too.
|
||
|
|
||
|
* ``SYM_INNER_LABEL*`` is used to denote a label inside some
|
||
|
``SYM_{CODE,FUNC}_START`` and ``SYM_{CODE,FUNC}_END``. They are very similar
|
||
|
to C labels, except they can be made global. An example of use::
|
||
|
|
||
|
SYM_CODE_START(ftrace_caller)
|
||
|
/* save_mcount_regs fills in first two parameters */
|
||
|
...
|
||
|
|
||
|
SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL)
|
||
|
/* Load the ftrace_ops into the 3rd parameter */
|
||
|
...
|
||
|
|
||
|
SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
|
||
|
call ftrace_stub
|
||
|
...
|
||
|
retq
|
||
|
SYM_CODE_END(ftrace_caller)
|
||
|
|
||
|
Data Macros
|
||
|
~~~~~~~~~~~
|
||
|
Similar to instructions, there is a couple of macros to describe data in the
|
||
|
assembly.
|
||
|
|
||
|
* ``SYM_DATA_START`` and ``SYM_DATA_START_LOCAL`` mark the start of some data
|
||
|
and shall be used in conjunction with either ``SYM_DATA_END``, or
|
||
|
``SYM_DATA_END_LABEL``. The latter adds also a label to the end, so that
|
||
|
people can use ``lstack`` and (local) ``lstack_end`` in the following
|
||
|
example::
|
||
|
|
||
|
SYM_DATA_START_LOCAL(lstack)
|
||
|
.skip 4096
|
||
|
SYM_DATA_END_LABEL(lstack, SYM_L_LOCAL, lstack_end)
|
||
|
|
||
|
* ``SYM_DATA`` and ``SYM_DATA_LOCAL`` are variants for simple, mostly one-line
|
||
|
data::
|
||
|
|
||
|
SYM_DATA(HEAP, .long rm_heap)
|
||
|
SYM_DATA(heap_end, .long rm_stack)
|
||
|
|
||
|
In the end, they expand to ``SYM_DATA_START`` with ``SYM_DATA_END``
|
||
|
internally.
|
||
|
|
||
|
Support Macros
|
||
|
~~~~~~~~~~~~~~
|
||
|
All the above reduce themselves to some invocation of ``SYM_START``,
|
||
|
``SYM_END``, or ``SYM_ENTRY`` at last. Normally, developers should avoid using
|
||
|
these.
|
||
|
|
||
|
Further, in the above examples, one could see ``SYM_L_LOCAL``. There are also
|
||
|
``SYM_L_GLOBAL`` and ``SYM_L_WEAK``. All are intended to denote linkage of a
|
||
|
symbol marked by them. They are used either in ``_LABEL`` variants of the
|
||
|
earlier macros, or in ``SYM_START``.
|
||
|
|
||
|
|
||
|
Overriding Macros
|
||
|
~~~~~~~~~~~~~~~~~
|
||
|
Architecture can also override any of the macros in their own
|
||
|
``asm/linkage.h``, including macros specifying the type of a symbol
|
||
|
(``SYM_T_FUNC``, ``SYM_T_OBJECT``, and ``SYM_T_NONE``). As every macro
|
||
|
described in this file is surrounded by ``#ifdef`` + ``#endif``, it is enough
|
||
|
to define the macros differently in the aforementioned architecture-dependent
|
||
|
header.
|