320 lines
12 KiB
ReStructuredText
320 lines
12 KiB
ReStructuredText
============================================================
|
|
Extending LLVM: Adding instructions, intrinsics, types, etc.
|
|
============================================================
|
|
|
|
Introduction and Warning
|
|
========================
|
|
|
|
|
|
During the course of using LLVM, you may wish to customize it for your research
|
|
project or for experimentation. At this point, you may realize that you need to
|
|
add something to LLVM, whether it be a new fundamental type, a new intrinsic
|
|
function, or a whole new instruction.
|
|
|
|
When you come to this realization, stop and think. Do you really need to extend
|
|
LLVM? Is it a new fundamental capability that LLVM does not support at its
|
|
current incarnation or can it be synthesized from already pre-existing LLVM
|
|
elements? If you are not sure, ask on the `LLVM-dev
|
|
<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ list. The reason is that
|
|
extending LLVM will get involved as you need to update all the different passes
|
|
that you intend to use with your extension, and there are ``many`` LLVM analyses
|
|
and transformations, so it may be quite a bit of work.
|
|
|
|
Adding an `intrinsic function`_ is far easier than adding an
|
|
instruction, and is transparent to optimization passes. If your added
|
|
functionality can be expressed as a function call, an intrinsic function is the
|
|
method of choice for LLVM extension.
|
|
|
|
Before you invest a significant amount of effort into a non-trivial extension,
|
|
**ask on the list** if what you are looking to do can be done with
|
|
already-existing infrastructure, or if maybe someone else is already working on
|
|
it. You will save yourself a lot of time and effort by doing so.
|
|
|
|
.. _intrinsic function:
|
|
|
|
Adding a new intrinsic function
|
|
===============================
|
|
|
|
Adding a new intrinsic function to LLVM is much easier than adding a new
|
|
instruction. Almost all extensions to LLVM should start as an intrinsic
|
|
function and then be turned into an instruction if warranted.
|
|
|
|
#. ``llvm/docs/LangRef.html``:
|
|
|
|
Document the intrinsic. Decide whether it is code generator specific and
|
|
what the restrictions are. Talk to other people about it so that you are
|
|
sure it's a good idea.
|
|
|
|
#. ``llvm/include/llvm/IR/Intrinsics*.td``:
|
|
|
|
Add an entry for your intrinsic. Describe its memory access
|
|
characteristics for optimization (this controls whether it will be
|
|
DCE'd, CSE'd, etc). If any arguments need to be immediates, these
|
|
must be indicated with the ImmArg property. Note that any intrinsic
|
|
using one of the ``llvm_any*_ty`` types for an argument or return
|
|
type will be deemed by ``tblgen`` as overloaded and the
|
|
corresponding suffix will be required on the intrinsic's name.
|
|
|
|
#. ``llvm/lib/Analysis/ConstantFolding.cpp``:
|
|
|
|
If it is possible to constant fold your intrinsic, add support to it in the
|
|
``canConstantFoldCallTo`` and ``ConstantFoldCall`` functions.
|
|
|
|
#. ``llvm/test/*``:
|
|
|
|
Add test cases for your test cases to the test suite
|
|
|
|
Once the intrinsic has been added to the system, you must add code generator
|
|
support for it. Generally you must do the following steps:
|
|
|
|
Add support to the .td file for the target(s) of your choice in
|
|
``lib/Target/*/*.td``.
|
|
|
|
This is usually a matter of adding a pattern to the .td file that matches the
|
|
intrinsic, though it may obviously require adding the instructions you want to
|
|
generate as well. There are lots of examples in the PowerPC and X86 backend
|
|
to follow.
|
|
|
|
Adding a new SelectionDAG node
|
|
==============================
|
|
|
|
As with intrinsics, adding a new SelectionDAG node to LLVM is much easier than
|
|
adding a new instruction. New nodes are often added to help represent
|
|
instructions common to many targets. These nodes often map to an LLVM
|
|
instruction (add, sub) or intrinsic (byteswap, population count). In other
|
|
cases, new nodes have been added to allow many targets to perform a common task
|
|
(converting between floating point and integer representation) or capture more
|
|
complicated behavior in a single node (rotate).
|
|
|
|
#. ``include/llvm/CodeGen/ISDOpcodes.h``:
|
|
|
|
Add an enum value for the new SelectionDAG node.
|
|
|
|
#. ``lib/CodeGen/SelectionDAG/SelectionDAG.cpp``:
|
|
|
|
Add code to print the node to ``getOperationName``. If your new node can be
|
|
evaluated at compile time when given constant arguments (such as an add of a
|
|
constant with another constant), find the ``getNode`` method that takes the
|
|
appropriate number of arguments, and add a case for your node to the switch
|
|
statement that performs constant folding for nodes that take the same number
|
|
of arguments as your new node.
|
|
|
|
#. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``:
|
|
|
|
Add code to `legalize, promote, and expand
|
|
<CodeGenerator.html#selectiondag_legalize>`_ the node as necessary. At a
|
|
minimum, you will need to add a case statement for your node in
|
|
``LegalizeOp`` which calls LegalizeOp on the node's operands, and returns a
|
|
new node if any of the operands changed as a result of being legalized. It
|
|
is likely that not all targets supported by the SelectionDAG framework will
|
|
natively support the new node. In this case, you must also add code in your
|
|
node's case statement in ``LegalizeOp`` to Expand your node into simpler,
|
|
legal operations. The case for ``ISD::UREM`` for expanding a remainder into
|
|
a divide, multiply, and a subtract is a good example.
|
|
|
|
#. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``:
|
|
|
|
If targets may support the new node being added only at certain sizes, you
|
|
will also need to add code to your node's case statement in ``LegalizeOp``
|
|
to Promote your node's operands to a larger size, and perform the correct
|
|
operation. You will also need to add code to ``PromoteOp`` to do this as
|
|
well. For a good example, see ``ISD::BSWAP``, which promotes its operand to
|
|
a wider size, performs the byteswap, and then shifts the correct bytes right
|
|
to emulate the narrower byteswap in the wider type.
|
|
|
|
#. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``:
|
|
|
|
Add a case for your node in ``ExpandOp`` to teach the legalizer how to
|
|
perform the action represented by the new node on a value that has been split
|
|
into high and low halves. This case will be used to support your node with a
|
|
64 bit operand on a 32 bit target.
|
|
|
|
#. ``lib/CodeGen/SelectionDAG/DAGCombiner.cpp``:
|
|
|
|
If your node can be combined with itself, or other existing nodes in a
|
|
peephole-like fashion, add a visit function for it, and call that function
|
|
from. There are several good examples for simple combines you can do;
|
|
``visitFABS`` and ``visitSRL`` are good starting places.
|
|
|
|
#. ``lib/Target/PowerPC/PPCISelLowering.cpp``:
|
|
|
|
Each target has an implementation of the ``TargetLowering`` class, usually in
|
|
its own file (although some targets include it in the same file as the
|
|
DAGToDAGISel). The default behavior for a target is to assume that your new
|
|
node is legal for all types that are legal for that target. If this target
|
|
does not natively support your node, then tell the target to either Promote
|
|
it (if it is supported at a larger type) or Expand it. This will cause the
|
|
code you wrote in ``LegalizeOp`` above to decompose your new node into other
|
|
legal nodes for this target.
|
|
|
|
#. ``include/llvm/Target/TargetSelectionDAG.td``:
|
|
|
|
Most current targets supported by LLVM generate code using the DAGToDAG
|
|
method, where SelectionDAG nodes are pattern matched to target-specific
|
|
nodes, which represent individual instructions. In order for the targets to
|
|
match an instruction to your new node, you must add a def for that node to
|
|
the list in this file, with the appropriate type constraints. Look at
|
|
``add``, ``bswap``, and ``fadd`` for examples.
|
|
|
|
#. ``lib/Target/PowerPC/PPCInstrInfo.td``:
|
|
|
|
Each target has a tablegen file that describes the target's instruction set.
|
|
For targets that use the DAGToDAG instruction selection framework, add a
|
|
pattern for your new node that uses one or more target nodes. Documentation
|
|
for this is a bit sparse right now, but there are several decent examples.
|
|
See the patterns for ``rotl`` in ``PPCInstrInfo.td``.
|
|
|
|
#. TODO: document complex patterns.
|
|
|
|
#. ``llvm/test/CodeGen/*``:
|
|
|
|
Add test cases for your new node to the test suite.
|
|
``llvm/test/CodeGen/X86/bswap.ll`` is a good example.
|
|
|
|
Adding a new instruction
|
|
========================
|
|
|
|
.. warning::
|
|
|
|
Adding instructions changes the bitcode format, and it will take some effort
|
|
to maintain compatibility with the previous version. Only add an instruction
|
|
if it is absolutely necessary.
|
|
|
|
#. ``llvm/include/llvm/IR/Instruction.def``:
|
|
|
|
add a number for your instruction and an enum name
|
|
|
|
#. ``llvm/include/llvm/IR/Instructions.h``:
|
|
|
|
add a definition for the class that will represent your instruction
|
|
|
|
#. ``llvm/include/llvm/IR/InstVisitor.h``:
|
|
|
|
add a prototype for a visitor to your new instruction type
|
|
|
|
#. ``llvm/lib/AsmParser/LLLexer.cpp``:
|
|
|
|
add a new token to parse your instruction from assembly text file
|
|
|
|
#. ``llvm/lib/AsmParser/LLParser.cpp``:
|
|
|
|
add the grammar on how your instruction can be read and what it will
|
|
construct as a result
|
|
|
|
#. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
|
|
|
|
add a case for your instruction and how it will be parsed from bitcode
|
|
|
|
#. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
|
|
|
|
add a case for your instruction and how it will be parsed from bitcode
|
|
|
|
#. ``llvm/lib/IR/Instruction.cpp``:
|
|
|
|
add a case for how your instruction will be printed out to assembly
|
|
|
|
#. ``llvm/lib/IR/Instructions.cpp``:
|
|
|
|
implement the class you defined in ``llvm/include/llvm/Instructions.h``
|
|
|
|
#. Test your instruction
|
|
|
|
#. ``llvm/lib/Target/*``:
|
|
|
|
add support for your instruction to code generators, or add a lowering pass.
|
|
|
|
#. ``llvm/test/*``:
|
|
|
|
add your test cases to the test suite.
|
|
|
|
Also, you need to implement (or modify) any analyses or passes that you want to
|
|
understand this new instruction.
|
|
|
|
Adding a new type
|
|
=================
|
|
|
|
.. warning::
|
|
|
|
Adding new types changes the bitcode format, and will break compatibility with
|
|
currently-existing LLVM installations. Only add new types if it is absolutely
|
|
necessary.
|
|
|
|
Adding a fundamental type
|
|
-------------------------
|
|
|
|
#. ``llvm/include/llvm/IR/Type.h``:
|
|
|
|
add enum for the new type; add static ``Type*`` for this type
|
|
|
|
#. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/CodeGen/ValueTypes.cpp``:
|
|
|
|
add mapping from ``TypeID`` => ``Type*``; initialize the static ``Type*``
|
|
|
|
#. ``llvm/include/llvm-c/Core.h`` and ``llvm/lib/IR/Core.cpp``:
|
|
|
|
add enum ``LLVMTypeKind`` and modify
|
|
``LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)`` for the new type
|
|
|
|
#. ``llvm/lib/AsmParser/LLLexer.cpp``:
|
|
|
|
add ability to parse in the type from text assembly
|
|
|
|
#. ``llvm/lib/AsmParser/LLParser.cpp``:
|
|
|
|
add a token for that type
|
|
|
|
#. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
|
|
|
|
modify ``void ModuleBitcodeWriter::writeTypeTable()`` to serialize your type
|
|
|
|
#. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
|
|
|
|
modify ``Error BitcodeReader::parseTypeTableBody()`` to read your data type
|
|
|
|
#. ``include/llvm/Bitcode/LLVMBitCodes.h``:
|
|
|
|
add enum ``TypeCodes`` for the new type
|
|
|
|
Adding a derived type
|
|
---------------------
|
|
|
|
#. ``llvm/include/llvm/IR/Type.h``:
|
|
|
|
add enum for the new type; add a forward declaration of the type also
|
|
|
|
#. ``llvm/include/llvm/IR/DerivedTypes.h``:
|
|
|
|
add new class to represent new class in the hierarchy; add forward
|
|
declaration to the TypeMap value type
|
|
|
|
#. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/CodeGen/ValueTypes.cpp``:
|
|
|
|
add support for derived type, notably `enum TypeID` and `is`, `get` methods.
|
|
|
|
#. ``llvm/include/llvm-c/Core.h`` and ``llvm/lib/IR/Core.cpp``:
|
|
|
|
add enum ``LLVMTypeKind`` and modify
|
|
`LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)` for the new type
|
|
|
|
#. ``llvm/lib/AsmParser/LLLexer.cpp``:
|
|
|
|
modify ``lltok::Kind LLLexer::LexIdentifier()`` to add ability to
|
|
parse in the type from text assembly
|
|
|
|
#. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
|
|
|
|
modify ``void ModuleBitcodeWriter::writeTypeTable()`` to serialize your type
|
|
|
|
#. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
|
|
|
|
modify ``Error BitcodeReader::parseTypeTableBody()`` to read your data type
|
|
|
|
#. ``include/llvm/Bitcode/LLVMBitCodes.h``:
|
|
|
|
add enum ``TypeCodes`` for the new type
|
|
|
|
#. ``llvm/lib/IR/AsmWriter.cpp``:
|
|
|
|
modify ``void TypePrinting::print(Type *Ty, raw_ostream &OS)``
|
|
to output the new derived type
|