858 lines
34 KiB
ReStructuredText
858 lines
34 KiB
ReStructuredText
|
===============================
|
||
|
ORC Design and Implementation
|
||
|
===============================
|
||
|
|
||
|
.. contents::
|
||
|
:local:
|
||
|
|
||
|
Introduction
|
||
|
============
|
||
|
|
||
|
This document aims to provide a high-level overview of the design and
|
||
|
implementation of the ORC JIT APIs. Except where otherwise stated all discussion
|
||
|
refers to the modern ORCv2 APIs (available since LLVM 7). Clients wishing to
|
||
|
transition from OrcV1 should see Section :ref:`transitioning_orcv1_to_orcv2`.
|
||
|
|
||
|
Use-cases
|
||
|
=========
|
||
|
|
||
|
ORC provides a modular API for building JIT compilers. There are a number
|
||
|
of use cases for such an API. For example:
|
||
|
|
||
|
1. The LLVM tutorials use a simple ORC-based JIT class to execute expressions
|
||
|
compiled from a toy language: Kaleidoscope.
|
||
|
|
||
|
2. The LLVM debugger, LLDB, uses a cross-compiling JIT for expression
|
||
|
evaluation. In this use case, cross compilation allows expressions compiled
|
||
|
in the debugger process to be executed on the debug target process, which may
|
||
|
be on a different device/architecture.
|
||
|
|
||
|
3. In high-performance JITs (e.g. JVMs, Julia) that want to make use of LLVM's
|
||
|
optimizations within an existing JIT infrastructure.
|
||
|
|
||
|
4. In interpreters and REPLs, e.g. Cling (C++) and the Swift interpreter.
|
||
|
|
||
|
By adopting a modular, library-based design we aim to make ORC useful in as many
|
||
|
of these contexts as possible.
|
||
|
|
||
|
Features
|
||
|
========
|
||
|
|
||
|
ORC provides the following features:
|
||
|
|
||
|
**JIT-linking**
|
||
|
ORC provides APIs to link relocatable object files (COFF, ELF, MachO) [1]_
|
||
|
into a target process at runtime. The target process may be the same process
|
||
|
that contains the JIT session object and jit-linker, or may be another process
|
||
|
(even one running on a different machine or architecture) that communicates
|
||
|
with the JIT via RPC.
|
||
|
|
||
|
**LLVM IR compilation**
|
||
|
ORC provides off the shelf components (IRCompileLayer, SimpleCompiler,
|
||
|
ConcurrentIRCompiler) that make it easy to add LLVM IR to a JIT'd process.
|
||
|
|
||
|
**Eager and lazy compilation**
|
||
|
By default, ORC will compile symbols as soon as they are looked up in the JIT
|
||
|
session object (``ExecutionSession``). Compiling eagerly by default makes it
|
||
|
easy to use ORC as an in-memory compiler for an existing JIT (similar to how
|
||
|
MCJIT is commonly used). However ORC also provides built-in support for lazy
|
||
|
compilation via lazy-reexports (see :ref:`Laziness`).
|
||
|
|
||
|
**Support for Custom Compilers and Program Representations**
|
||
|
Clients can supply custom compilers for each symbol that they define in their
|
||
|
JIT session. ORC will run the user-supplied compiler when the a definition of
|
||
|
a symbol is needed. ORC is actually fully language agnostic: LLVM IR is not
|
||
|
treated specially, and is supported via the same wrapper mechanism (the
|
||
|
``MaterializationUnit`` class) that is used for custom compilers.
|
||
|
|
||
|
**Concurrent JIT'd code** and **Concurrent Compilation**
|
||
|
JIT'd code may be executed in multiple threads, may spawn new threads, and may
|
||
|
re-enter the ORC (e.g. to request lazy compilation) concurrently from multiple
|
||
|
threads. Compilers launched my ORC can run concurrently (provided the client
|
||
|
sets up an appropriate dispatcher). Built-in dependency tracking ensures that
|
||
|
ORC does not release pointers to JIT'd code or data until all dependencies
|
||
|
have also been JIT'd and they are safe to call or use.
|
||
|
|
||
|
**Removable Code**
|
||
|
Resources for JIT'd program representations
|
||
|
|
||
|
**Orthogonality** and **Composability**
|
||
|
Each of the features above can be used independently. It is possible to put
|
||
|
ORC components together to make a non-lazy, in-process, single threaded JIT
|
||
|
or a lazy, out-of-process, concurrent JIT, or anything in between.
|
||
|
|
||
|
LLJIT and LLLazyJIT
|
||
|
===================
|
||
|
|
||
|
ORC provides two basic JIT classes off-the-shelf. These are useful both as
|
||
|
examples of how to assemble ORC components to make a JIT, and as replacements
|
||
|
for earlier LLVM JIT APIs (e.g. MCJIT).
|
||
|
|
||
|
The LLJIT class uses an IRCompileLayer and RTDyldObjectLinkingLayer to support
|
||
|
compilation of LLVM IR and linking of relocatable object files. All operations
|
||
|
are performed eagerly on symbol lookup (i.e. a symbol's definition is compiled
|
||
|
as soon as you attempt to look up its address). LLJIT is a suitable replacement
|
||
|
for MCJIT in most cases (note: some more advanced features, e.g.
|
||
|
JITEventListeners are not supported yet).
|
||
|
|
||
|
The LLLazyJIT extends LLJIT and adds a CompileOnDemandLayer to enable lazy
|
||
|
compilation of LLVM IR. When an LLVM IR module is added via the addLazyIRModule
|
||
|
method, function bodies in that module will not be compiled until they are first
|
||
|
called. LLLazyJIT aims to provide a replacement of LLVM's original (pre-MCJIT)
|
||
|
JIT API.
|
||
|
|
||
|
LLJIT and LLLazyJIT instances can be created using their respective builder
|
||
|
classes: LLJITBuilder and LLazyJITBuilder. For example, assuming you have a
|
||
|
module ``M`` loaded on a ThreadSafeContext ``Ctx``:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
// Try to detect the host arch and construct an LLJIT instance.
|
||
|
auto JIT = LLJITBuilder().create();
|
||
|
|
||
|
// If we could not construct an instance, return an error.
|
||
|
if (!JIT)
|
||
|
return JIT.takeError();
|
||
|
|
||
|
// Add the module.
|
||
|
if (auto Err = JIT->addIRModule(TheadSafeModule(std::move(M), Ctx)))
|
||
|
return Err;
|
||
|
|
||
|
// Look up the JIT'd code entry point.
|
||
|
auto EntrySym = JIT->lookup("entry");
|
||
|
if (!EntrySym)
|
||
|
return EntrySym.takeError();
|
||
|
|
||
|
// Cast the entry point address to a function pointer.
|
||
|
auto *Entry = (void(*)())EntrySym.getAddress();
|
||
|
|
||
|
// Call into JIT'd code.
|
||
|
Entry();
|
||
|
|
||
|
The builder classes provide a number of configuration options that can be
|
||
|
specified before the JIT instance is constructed. For example:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
// Build an LLLazyJIT instance that uses four worker threads for compilation,
|
||
|
// and jumps to a specific error handler (rather than null) on lazy compile
|
||
|
// failures.
|
||
|
|
||
|
void handleLazyCompileFailure() {
|
||
|
// JIT'd code will jump here if lazy compilation fails, giving us an
|
||
|
// opportunity to exit or throw an exception into JIT'd code.
|
||
|
throw JITFailed();
|
||
|
}
|
||
|
|
||
|
auto JIT = LLLazyJITBuilder()
|
||
|
.setNumCompileThreads(4)
|
||
|
.setLazyCompileFailureAddr(
|
||
|
toJITTargetAddress(&handleLazyCompileFailure))
|
||
|
.create();
|
||
|
|
||
|
// ...
|
||
|
|
||
|
For users wanting to get started with LLJIT a minimal example program can be
|
||
|
found at ``llvm/examples/HowToUseLLJIT``.
|
||
|
|
||
|
Design Overview
|
||
|
===============
|
||
|
|
||
|
ORC's JIT program model aims to emulate the linking and symbol resolution
|
||
|
rules used by the static and dynamic linkers. This allows ORC to JIT
|
||
|
arbitrary LLVM IR, including IR produced by an ordinary static compiler (e.g.
|
||
|
clang) that uses constructs like symbol linkage and visibility, and weak [3]_
|
||
|
and common symbol definitions.
|
||
|
|
||
|
To see how this works, imagine a program ``foo`` which links against a pair
|
||
|
of dynamic libraries: ``libA`` and ``libB``. On the command line, building this
|
||
|
program might look like:
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
$ clang++ -shared -o libA.dylib a1.cpp a2.cpp
|
||
|
$ clang++ -shared -o libB.dylib b1.cpp b2.cpp
|
||
|
$ clang++ -o myapp myapp.cpp -L. -lA -lB
|
||
|
$ ./myapp
|
||
|
|
||
|
In ORC, this would translate into API calls on a hypothetical CXXCompilingLayer
|
||
|
(with error checking omitted for brevity) as:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ExecutionSession ES;
|
||
|
RTDyldObjectLinkingLayer ObjLinkingLayer(
|
||
|
ES, []() { return std::make_unique<SectionMemoryManager>(); });
|
||
|
CXXCompileLayer CXXLayer(ES, ObjLinkingLayer);
|
||
|
|
||
|
// Create JITDylib "A" and add code to it using the CXX layer.
|
||
|
auto &LibA = ES.createJITDylib("A");
|
||
|
CXXLayer.add(LibA, MemoryBuffer::getFile("a1.cpp"));
|
||
|
CXXLayer.add(LibA, MemoryBuffer::getFile("a2.cpp"));
|
||
|
|
||
|
// Create JITDylib "B" and add code to it using the CXX layer.
|
||
|
auto &LibB = ES.createJITDylib("B");
|
||
|
CXXLayer.add(LibB, MemoryBuffer::getFile("b1.cpp"));
|
||
|
CXXLayer.add(LibB, MemoryBuffer::getFile("b2.cpp"));
|
||
|
|
||
|
// Create and specify the search order for the main JITDylib. This is
|
||
|
// equivalent to a "links against" relationship in a command-line link.
|
||
|
auto &MainJD = ES.createJITDylib("main");
|
||
|
MainJD.addToLinkOrder(&LibA);
|
||
|
MainJD.addToLinkOrder(&LibB);
|
||
|
CXXLayer.add(MainJD, MemoryBuffer::getFile("main.cpp"));
|
||
|
|
||
|
// Look up the JIT'd main, cast it to a function pointer, then call it.
|
||
|
auto MainSym = ExitOnErr(ES.lookup({&MainJD}, "main"));
|
||
|
auto *Main = (int(*)(int, char*[]))MainSym.getAddress();
|
||
|
|
||
|
int Result = Main(...);
|
||
|
|
||
|
This example tells us nothing about *how* or *when* compilation will happen.
|
||
|
That will depend on the implementation of the hypothetical CXXCompilingLayer.
|
||
|
The same linker-based symbol resolution rules will apply regardless of that
|
||
|
implementation, however. For example, if a1.cpp and a2.cpp both define a
|
||
|
function "foo" then ORCv2 will generate a duplicate definition error. On the
|
||
|
other hand, if a1.cpp and b1.cpp both define "foo" there is no error (different
|
||
|
dynamic libraries may define the same symbol). If main.cpp refers to "foo", it
|
||
|
should bind to the definition in LibA rather than the one in LibB, since
|
||
|
main.cpp is part of the "main" dylib, and the main dylib links against LibA
|
||
|
before LibB.
|
||
|
|
||
|
Many JIT clients will have no need for this strict adherence to the usual
|
||
|
ahead-of-time linking rules, and should be able to get by just fine by putting
|
||
|
all of their code in a single JITDylib. However, clients who want to JIT code
|
||
|
for languages/projects that traditionally rely on ahead-of-time linking (e.g.
|
||
|
C++) will find that this feature makes life much easier.
|
||
|
|
||
|
Symbol lookup in ORC serves two other important functions, beyond providing
|
||
|
addresses for symbols: (1) It triggers compilation of the symbol(s) searched for
|
||
|
(if they have not been compiled already), and (2) it provides the
|
||
|
synchronization mechanism for concurrent compilation. The pseudo-code for the
|
||
|
lookup process is:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
construct a query object from a query set and query handler
|
||
|
lock the session
|
||
|
lodge query against requested symbols, collect required materializers (if any)
|
||
|
unlock the session
|
||
|
dispatch materializers (if any)
|
||
|
|
||
|
In this context a materializer is something that provides a working definition
|
||
|
of a symbol upon request. Usually materializers are just wrappers for compilers,
|
||
|
but they may also wrap a jit-linker directly (if the program representation
|
||
|
backing the definitions is an object file), or may even be a class that writes
|
||
|
bits directly into memory (for example, if the definitions are
|
||
|
stubs). Materialization is the blanket term for any actions (compiling, linking,
|
||
|
splatting bits, registering with runtimes, etc.) that are required to generate a
|
||
|
symbol definition that is safe to call or access.
|
||
|
|
||
|
As each materializer completes its work it notifies the JITDylib, which in turn
|
||
|
notifies any query objects that are waiting on the newly materialized
|
||
|
definitions. Each query object maintains a count of the number of symbols that
|
||
|
it is still waiting on, and once this count reaches zero the query object calls
|
||
|
the query handler with a *SymbolMap* (a map of symbol names to addresses)
|
||
|
describing the result. If any symbol fails to materialize the query immediately
|
||
|
calls the query handler with an error.
|
||
|
|
||
|
The collected materialization units are sent to the ExecutionSession to be
|
||
|
dispatched, and the dispatch behavior can be set by the client. By default each
|
||
|
materializer is run on the calling thread. Clients are free to create new
|
||
|
threads to run materializers, or to send the work to a work queue for a thread
|
||
|
pool (this is what LLJIT/LLLazyJIT do).
|
||
|
|
||
|
Top Level APIs
|
||
|
==============
|
||
|
|
||
|
Many of ORC's top-level APIs are visible in the example above:
|
||
|
|
||
|
- *ExecutionSession* represents the JIT'd program and provides context for the
|
||
|
JIT: It contains the JITDylibs, error reporting mechanisms, and dispatches the
|
||
|
materializers.
|
||
|
|
||
|
- *JITDylibs* provide the symbol tables.
|
||
|
|
||
|
- *Layers* (ObjLinkingLayer and CXXLayer) are wrappers around compilers and
|
||
|
allow clients to add uncompiled program representations supported by those
|
||
|
compilers to JITDylibs.
|
||
|
|
||
|
Several other important APIs are used explicitly. JIT clients need not be aware
|
||
|
of them, but Layer authors will use them:
|
||
|
|
||
|
- *MaterializationUnit* - When XXXLayer::add is invoked it wraps the given
|
||
|
program representation (in this example, C++ source) in a MaterializationUnit,
|
||
|
which is then stored in the JITDylib. MaterializationUnits are responsible for
|
||
|
describing the definitions they provide, and for unwrapping the program
|
||
|
representation and passing it back to the layer when compilation is required
|
||
|
(this ownership shuffle makes writing thread-safe layers easier, since the
|
||
|
ownership of the program representation will be passed back on the stack,
|
||
|
rather than having to be fished out of a Layer member, which would require
|
||
|
synchronization).
|
||
|
|
||
|
- *MaterializationResponsibility* - When a MaterializationUnit hands a program
|
||
|
representation back to the layer it comes with an associated
|
||
|
MaterializationResponsibility object. This object tracks the definitions
|
||
|
that must be materialized and provides a way to notify the JITDylib once they
|
||
|
are either successfully materialized or a failure occurs.
|
||
|
|
||
|
Absolute Symbols, Aliases, and Reexports
|
||
|
========================================
|
||
|
|
||
|
ORC makes it easy to define symbols with absolute addresses, or symbols that
|
||
|
are simply aliases of other symbols:
|
||
|
|
||
|
Absolute Symbols
|
||
|
----------------
|
||
|
|
||
|
Absolute symbols are symbols that map directly to addresses without requiring
|
||
|
further materialization, for example: "foo" = 0x1234. One use case for
|
||
|
absolute symbols is allowing resolution of process symbols. E.g.
|
||
|
|
||
|
.. code-block: c++
|
||
|
|
||
|
JD.define(absoluteSymbols(SymbolMap({
|
||
|
{ Mangle("printf"),
|
||
|
{ pointerToJITTargetAddress(&printf),
|
||
|
JITSymbolFlags::Callable } }
|
||
|
});
|
||
|
|
||
|
With this mapping established code added to the JIT can refer to printf
|
||
|
symbolically rather than requiring the address of printf to be "baked in".
|
||
|
This in turn allows cached versions of the JIT'd code (e.g. compiled objects)
|
||
|
to be re-used across JIT sessions as the JIT'd code no longer changes, only the
|
||
|
absolute symbol definition does.
|
||
|
|
||
|
For process and library symbols the DynamicLibrarySearchGenerator utility (See
|
||
|
:ref:`How to Add Process and Library Symbols to JITDylibs
|
||
|
<ProcessAndLibrarySymbols>`) can be used to automatically build absolute
|
||
|
symbol mappings for you. However the absoluteSymbols function is still useful
|
||
|
for making non-global objects in your JIT visible to JIT'd code. For example,
|
||
|
imagine that your JIT standard library needs access to your JIT object to make
|
||
|
some calls. We could bake the address of your object into the library, but then
|
||
|
it would need to be recompiled for each session:
|
||
|
|
||
|
.. code-block: c++
|
||
|
|
||
|
// From standard library for JIT'd code:
|
||
|
|
||
|
class MyJIT {
|
||
|
public:
|
||
|
void log(const char *Msg);
|
||
|
};
|
||
|
|
||
|
void log(const char *Msg) { ((MyJIT*)0x1234)->log(Msg); }
|
||
|
|
||
|
We can turn this into a symbolic reference in the JIT standard library:
|
||
|
|
||
|
.. code-block: c++
|
||
|
|
||
|
extern MyJIT *__MyJITInstance;
|
||
|
|
||
|
void log(const char *Msg) { __MyJITInstance->log(Msg); }
|
||
|
|
||
|
And then make our JIT object visible to the JIT standard library with an
|
||
|
absolute symbol definition when the JIT is started:
|
||
|
|
||
|
.. code-block: c++
|
||
|
|
||
|
MyJIT J = ...;
|
||
|
|
||
|
auto &JITStdLibJD = ... ;
|
||
|
|
||
|
JITStdLibJD.define(absoluteSymbols(SymbolMap({
|
||
|
{ Mangle("__MyJITInstance"),
|
||
|
{ pointerToJITTargetAddress(&J), JITSymbolFlags() } }
|
||
|
});
|
||
|
|
||
|
Aliases and Reexports
|
||
|
---------------------
|
||
|
|
||
|
Aliases and reexports allow you to define new symbols that map to existing
|
||
|
symbols. This can be useful for changing linkage relationships between symbols
|
||
|
across sessions without having to recompile code. For example, imagine that
|
||
|
JIT'd code has access to a log function, ``void log(const char*)`` for which
|
||
|
there are two implementations in the JIT standard library: ``log_fast`` and
|
||
|
``log_detailed``. Your JIT can choose which one of these definitions will be
|
||
|
used when the ``log`` symbol is referenced by setting up an alias at JIT startup
|
||
|
time:
|
||
|
|
||
|
.. code-block: c++
|
||
|
|
||
|
auto &JITStdLibJD = ... ;
|
||
|
|
||
|
auto LogImplementationSymbol =
|
||
|
Verbose ? Mangle("log_detailed") : Mangle("log_fast");
|
||
|
|
||
|
JITStdLibJD.define(
|
||
|
symbolAliases(SymbolAliasMap({
|
||
|
{ Mangle("log"),
|
||
|
{ LogImplementationSymbol
|
||
|
JITSymbolFlags::Exported | JITSymbolFlags::Callable } }
|
||
|
});
|
||
|
|
||
|
The ``symbolAliases`` function allows you to define aliases within a single
|
||
|
JITDylib. The ``reexports`` function provides the same functionality, but
|
||
|
operates across JITDylib boundaries. E.g.
|
||
|
|
||
|
.. code-block: c++
|
||
|
|
||
|
auto &JD1 = ... ;
|
||
|
auto &JD2 = ... ;
|
||
|
|
||
|
// Make 'bar' in JD2 an alias for 'foo' from JD1.
|
||
|
JD2.define(
|
||
|
reexports(JD1, SymbolAliasMap({
|
||
|
{ Mangle("bar"), { Mangle("foo"), JITSymbolFlags::Exported } }
|
||
|
});
|
||
|
|
||
|
The reexports utility can be handy for composing a single JITDylib interface by
|
||
|
re-exporting symbols from several other JITDylibs.
|
||
|
|
||
|
.. _Laziness:
|
||
|
|
||
|
Laziness
|
||
|
========
|
||
|
|
||
|
Laziness in ORC is provided by a utility called "lazy reexports". A lazy
|
||
|
reexport is similar to a regular reexport or alias: It provides a new name for
|
||
|
an existing symbol. Unlike regular reexports however, lookups of lazy reexports
|
||
|
do not trigger immediate materialization of the reexported symbol. Instead, they
|
||
|
only trigger materialization of a function stub. This function stub is
|
||
|
initialized to point at a *lazy call-through*, which provides reentry into the
|
||
|
JIT. If the stub is called at runtime then the lazy call-through will look up
|
||
|
the reexported symbol (triggering materialization for it if necessary), update
|
||
|
the stub (to call directly to the reexported symbol on subsequent calls), and
|
||
|
then return via the reexported symbol. By re-using the existing symbol lookup
|
||
|
mechanism, lazy reexports inherit the same concurrency guarantees: calls to lazy
|
||
|
reexports can be made from multiple threads concurrently, and the reexported
|
||
|
symbol can be any state of compilation (uncompiled, already in the process of
|
||
|
being compiled, or already compiled) and the call will succeed. This allows
|
||
|
laziness to be safely mixed with features like remote compilation, concurrent
|
||
|
compilation, concurrent JIT'd code, and speculative compilation.
|
||
|
|
||
|
There is one other key difference between regular reexports and lazy reexports
|
||
|
that some clients must be aware of: The address of a lazy reexport will be
|
||
|
*different* from the address of the reexported symbol (whereas a regular
|
||
|
reexport is guaranteed to have the same address as the reexported symbol).
|
||
|
Clients who care about pointer equality will generally want to use the address
|
||
|
of the reexport as the canonical address of the reexported symbol. This will
|
||
|
allow the address to be taken without forcing materialization of the reexport.
|
||
|
|
||
|
Usage example:
|
||
|
|
||
|
If JITDylib ``JD`` contains definitions for symbols ``foo_body`` and
|
||
|
``bar_body``, we can create lazy entry points ``Foo`` and ``Bar`` in JITDylib
|
||
|
``JD2`` by calling:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
auto ReexportFlags = JITSymbolFlags::Exported | JITSymbolFlags::Callable;
|
||
|
JD2.define(
|
||
|
lazyReexports(CallThroughMgr, StubsMgr, JD,
|
||
|
SymbolAliasMap({
|
||
|
{ Mangle("foo"), { Mangle("foo_body"), ReexportedFlags } },
|
||
|
{ Mangle("bar"), { Mangle("bar_body"), ReexportedFlags } }
|
||
|
}));
|
||
|
|
||
|
A full example of how to use lazyReexports with the LLJIT class can be found at
|
||
|
``llvm_project/llvm/examples/LLJITExamples/LLJITWithLazyReexports``.
|
||
|
|
||
|
Supporting Custom Compilers
|
||
|
===========================
|
||
|
|
||
|
TBD.
|
||
|
|
||
|
.. _transitioning_orcv1_to_orcv2:
|
||
|
|
||
|
Transitioning from ORCv1 to ORCv2
|
||
|
=================================
|
||
|
|
||
|
Since LLVM 7.0, new ORC development work has focused on adding support for
|
||
|
concurrent JIT compilation. The new APIs (including new layer interfaces and
|
||
|
implementations, and new utilities) that support concurrency are collectively
|
||
|
referred to as ORCv2, and the original, non-concurrent layers and utilities
|
||
|
are now referred to as ORCv1.
|
||
|
|
||
|
The majority of the ORCv1 layers and utilities were renamed with a 'Legacy'
|
||
|
prefix in LLVM 8.0, and have deprecation warnings attached in LLVM 9.0. In LLVM
|
||
|
12.0 ORCv1 will be removed entirely.
|
||
|
|
||
|
Transitioning from ORCv1 to ORCv2 should be easy for most clients. Most of the
|
||
|
ORCv1 layers and utilities have ORCv2 counterparts [2]_ that can be directly
|
||
|
substituted. However there are some design differences between ORCv1 and ORCv2
|
||
|
to be aware of:
|
||
|
|
||
|
1. ORCv2 fully adopts the JIT-as-linker model that began with MCJIT. Modules
|
||
|
(and other program representations, e.g. Object Files) are no longer added
|
||
|
directly to JIT classes or layers. Instead, they are added to ``JITDylib``
|
||
|
instances *by* layers. The ``JITDylib`` determines *where* the definitions
|
||
|
reside, the layers determine *how* the definitions will be compiled.
|
||
|
Linkage relationships between ``JITDylibs`` determine how inter-module
|
||
|
references are resolved, and symbol resolvers are no longer used. See the
|
||
|
section `Design Overview`_ for more details.
|
||
|
|
||
|
Unless multiple JITDylibs are needed to model linkage relationships, ORCv1
|
||
|
clients should place all code in a single JITDylib.
|
||
|
MCJIT clients should use LLJIT (see `LLJIT and LLLazyJIT`_), and can place
|
||
|
code in LLJIT's default created main JITDylib (See
|
||
|
``LLJIT::getMainJITDylib()``).
|
||
|
|
||
|
2. All JIT stacks now need an ``ExecutionSession`` instance. ExecutionSession
|
||
|
manages the string pool, error reporting, synchronization, and symbol
|
||
|
lookup.
|
||
|
|
||
|
3. ORCv2 uses uniqued strings (``SymbolStringPtr`` instances) rather than
|
||
|
string values in order to reduce memory overhead and improve lookup
|
||
|
performance. See the subsection `How to manage symbol strings`_.
|
||
|
|
||
|
4. IR layers require ThreadSafeModule instances, rather than
|
||
|
std::unique_ptr<Module>s. ThreadSafeModule is a wrapper that ensures that
|
||
|
Modules that use the same LLVMContext are not accessed concurrently.
|
||
|
See `How to use ThreadSafeModule and ThreadSafeContext`_.
|
||
|
|
||
|
5. Symbol lookup is no longer handled by layers. Instead, there is a
|
||
|
``lookup`` method on JITDylib that takes a list of JITDylibs to scan.
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ExecutionSession ES;
|
||
|
JITDylib &JD1 = ...;
|
||
|
JITDylib &JD2 = ...;
|
||
|
|
||
|
auto Sym = ES.lookup({&JD1, &JD2}, ES.intern("_main"));
|
||
|
|
||
|
6. Module removal is not yet supported. There is no equivalent of the
|
||
|
layer concept removeModule/removeObject methods. Work on resource tracking
|
||
|
and removal in ORCv2 is ongoing.
|
||
|
|
||
|
For code examples and suggestions of how to use the ORCv2 APIs, please see
|
||
|
the section `How-tos`_.
|
||
|
|
||
|
How-tos
|
||
|
=======
|
||
|
|
||
|
How to manage symbol strings
|
||
|
----------------------------
|
||
|
|
||
|
Symbol strings in ORC are uniqued to improve lookup performance, reduce memory
|
||
|
overhead, and allow symbol names to function as efficient keys. To get the
|
||
|
unique ``SymbolStringPtr`` for a string value, call the
|
||
|
``ExecutionSession::intern`` method:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ExecutionSession ES;
|
||
|
/// ...
|
||
|
auto MainSymbolName = ES.intern("main");
|
||
|
|
||
|
If you wish to perform lookup using the C/IR name of a symbol you will also
|
||
|
need to apply the platform linker-mangling before interning the string. On
|
||
|
Linux this mangling is a no-op, but on other platforms it usually involves
|
||
|
adding a prefix to the string (e.g. '_' on Darwin). The mangling scheme is
|
||
|
based on the DataLayout for the target. Given a DataLayout and an
|
||
|
ExecutionSession, you can create a MangleAndInterner function object that
|
||
|
will perform both jobs for you:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ExecutionSession ES;
|
||
|
const DataLayout &DL = ...;
|
||
|
MangleAndInterner Mangle(ES, DL);
|
||
|
|
||
|
// ...
|
||
|
|
||
|
// Portable IR-symbol-name lookup:
|
||
|
auto Sym = ES.lookup({&MainJD}, Mangle("main"));
|
||
|
|
||
|
How to create JITDylibs and set up linkage relationships
|
||
|
--------------------------------------------------------
|
||
|
|
||
|
In ORC, all symbol definitions reside in JITDylibs. JITDylibs are created by
|
||
|
calling the ``ExecutionSession::createJITDylib`` method with a unique name:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ExecutionSession ES;
|
||
|
auto &JD = ES.createJITDylib("libFoo.dylib");
|
||
|
|
||
|
The JITDylib is owned by the ``ExecutionEngine`` instance and will be freed
|
||
|
when it is destroyed.
|
||
|
|
||
|
How to use ThreadSafeModule and ThreadSafeContext
|
||
|
-------------------------------------------------
|
||
|
|
||
|
ThreadSafeModule and ThreadSafeContext are wrappers around Modules and
|
||
|
LLVMContexts respectively. A ThreadSafeModule is a pair of a
|
||
|
std::unique_ptr<Module> and a (possibly shared) ThreadSafeContext value. A
|
||
|
ThreadSafeContext is a pair of a std::unique_ptr<LLVMContext> and a lock.
|
||
|
This design serves two purposes: providing a locking scheme and lifetime
|
||
|
management for LLVMContexts. The ThreadSafeContext may be locked to prevent
|
||
|
accidental concurrent access by two Modules that use the same LLVMContext.
|
||
|
The underlying LLVMContext is freed once all ThreadSafeContext values pointing
|
||
|
to it are destroyed, allowing the context memory to be reclaimed as soon as
|
||
|
the Modules referring to it are destroyed.
|
||
|
|
||
|
ThreadSafeContexts can be explicitly constructed from a
|
||
|
std::unique_ptr<LLVMContext>:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ThreadSafeContext TSCtx(std::make_unique<LLVMContext>());
|
||
|
|
||
|
ThreadSafeModules can be constructed from a pair of a std::unique_ptr<Module>
|
||
|
and a ThreadSafeContext value. ThreadSafeContext values may be shared between
|
||
|
multiple ThreadSafeModules:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ThreadSafeModule TSM1(
|
||
|
std::make_unique<Module>("M1", *TSCtx.getContext()), TSCtx);
|
||
|
|
||
|
ThreadSafeModule TSM2(
|
||
|
std::make_unique<Module>("M2", *TSCtx.getContext()), TSCtx);
|
||
|
|
||
|
Before using a ThreadSafeContext, clients should ensure that either the context
|
||
|
is only accessible on the current thread, or that the context is locked. In the
|
||
|
example above (where the context is never locked) we rely on the fact that both
|
||
|
``TSM1`` and ``TSM2``, and TSCtx are all created on one thread. If a context is
|
||
|
going to be shared between threads then it must be locked before any accessing
|
||
|
or creating any Modules attached to it. E.g.
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ThreadSafeContext TSCtx(std::make_unique<LLVMContext>());
|
||
|
|
||
|
ThreadPool TP(NumThreads);
|
||
|
JITStack J;
|
||
|
|
||
|
for (auto &ModulePath : ModulePaths) {
|
||
|
TP.async(
|
||
|
[&]() {
|
||
|
auto Lock = TSCtx.getLock();
|
||
|
auto M = loadModuleOnContext(ModulePath, TSCtx.getContext());
|
||
|
J.addModule(ThreadSafeModule(std::move(M), TSCtx));
|
||
|
});
|
||
|
}
|
||
|
|
||
|
TP.wait();
|
||
|
|
||
|
To make exclusive access to Modules easier to manage the ThreadSafeModule class
|
||
|
provides a convenience function, ``withModuleDo``, that implicitly (1) locks the
|
||
|
associated context, (2) runs a given function object, (3) unlocks the context,
|
||
|
and (3) returns the result generated by the function object. E.g.
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
ThreadSafeModule TSM = getModule(...);
|
||
|
|
||
|
// Dump the module:
|
||
|
size_t NumFunctionsInModule =
|
||
|
TSM.withModuleDo(
|
||
|
[](Module &M) { // <- Context locked before entering lambda.
|
||
|
return M.size();
|
||
|
} // <- Context unlocked after leaving.
|
||
|
);
|
||
|
|
||
|
Clients wishing to maximize possibilities for concurrent compilation will want
|
||
|
to create every new ThreadSafeModule on a new ThreadSafeContext. For this
|
||
|
reason a convenience constructor for ThreadSafeModule is provided that implicitly
|
||
|
constructs a new ThreadSafeContext value from a std::unique_ptr<LLVMContext>:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
// Maximize concurrency opportunities by loading every module on a
|
||
|
// separate context.
|
||
|
for (const auto &IRPath : IRPaths) {
|
||
|
auto Ctx = std::make_unique<LLVMContext>();
|
||
|
auto M = std::make_unique<LLVMContext>("M", *Ctx);
|
||
|
CompileLayer.add(MainJD, ThreadSafeModule(std::move(M), std::move(Ctx)));
|
||
|
}
|
||
|
|
||
|
Clients who plan to run single-threaded may choose to save memory by loading
|
||
|
all modules on the same context:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
// Save memory by using one context for all Modules:
|
||
|
ThreadSafeContext TSCtx(std::make_unique<LLVMContext>());
|
||
|
for (const auto &IRPath : IRPaths) {
|
||
|
ThreadSafeModule TSM(parsePath(IRPath, *TSCtx.getContext()), TSCtx);
|
||
|
CompileLayer.add(MainJD, ThreadSafeModule(std::move(TSM));
|
||
|
}
|
||
|
|
||
|
.. _ProcessAndLibrarySymbols:
|
||
|
|
||
|
How to Add Process and Library Symbols to the JITDylibs
|
||
|
=======================================================
|
||
|
|
||
|
JIT'd code typically needs access to symbols in the host program or in
|
||
|
supporting libraries. References to process symbols can be "baked in" to code
|
||
|
as it is compiled by turning external references into pre-resolved integer
|
||
|
constants, however this ties the JIT'd code to the current process's virtual
|
||
|
memory layout (meaning that it can not be cached between runs) and makes
|
||
|
debugging lower level program representations difficult (as all external
|
||
|
references are opaque integer values). A bettor solution is to maintain symbolic
|
||
|
external references and let the jit-linker bind them for you at runtime. To
|
||
|
allow the JIT linker to find these external definitions their addresses must
|
||
|
be added to a JITDylib that the JIT'd definitions link against.
|
||
|
|
||
|
Adding definitions for external symbols could be done using the absoluteSymbols
|
||
|
function:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
const DataLayout &DL = getDataLayout();
|
||
|
MangleAndInterner Mangle(ES, DL);
|
||
|
|
||
|
auto &JD = ES.createJITDylib("main");
|
||
|
|
||
|
JD.define(
|
||
|
absoluteSymbols({
|
||
|
{ Mangle("puts"), pointerToJITTargetAddress(&puts)},
|
||
|
{ Mangle("gets"), pointerToJITTargetAddress(&getS)}
|
||
|
}));
|
||
|
|
||
|
Manually adding absolute symbols for a large or changing interface is cumbersome
|
||
|
however, so ORC provides an alternative to generate new definitions on demand:
|
||
|
*definition generators*. If a definition generator is attached to a JITDylib,
|
||
|
then any unsuccessful lookup on that JITDylib will fall back to calling the
|
||
|
definition generator, and the definition generator may choose to generate a new
|
||
|
definition for the missing symbols. Of particular use here is the
|
||
|
``DynamicLibrarySearchGenerator`` utility. This can be used to reflect the whole
|
||
|
exported symbol set of the process or a specific dynamic library, or a subset
|
||
|
of either of these determined by a predicate.
|
||
|
|
||
|
For example, to load the whole interface of a runtime library:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
const DataLayout &DL = getDataLayout();
|
||
|
auto &JD = ES.createJITDylib("main");
|
||
|
|
||
|
JD.setGenerator(DynamicLibrarySearchGenerator::Load("/path/to/lib"
|
||
|
DL.getGlobalPrefix()));
|
||
|
|
||
|
// IR added to JD can now link against all symbols exported by the library
|
||
|
// at '/path/to/lib'.
|
||
|
CompileLayer.add(JD, loadModule(...));
|
||
|
|
||
|
Or, to expose an allowed set of symbols from the main process:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
const DataLayout &DL = getDataLayout();
|
||
|
MangleAndInterner Mangle(ES, DL);
|
||
|
|
||
|
auto &JD = ES.createJITDylib("main");
|
||
|
|
||
|
DenseSet<SymbolStringPtr> AllowList({
|
||
|
Mangle("puts"),
|
||
|
Mangle("gets")
|
||
|
});
|
||
|
|
||
|
// Use GetForCurrentProcess with a predicate function that checks the
|
||
|
// allowed list.
|
||
|
JD.setGenerator(
|
||
|
DynamicLibrarySearchGenerator::GetForCurrentProcess(
|
||
|
DL.getGlobalPrefix(),
|
||
|
[&](const SymbolStringPtr &S) { return AllowList.count(S); }));
|
||
|
|
||
|
// IR added to JD can now link against any symbols exported by the process
|
||
|
// and contained in the list.
|
||
|
CompileLayer.add(JD, loadModule(...));
|
||
|
|
||
|
Roadmap
|
||
|
=======
|
||
|
|
||
|
ORC is still undergoing active development. Some current and future works are
|
||
|
listed below.
|
||
|
|
||
|
Current Work
|
||
|
------------
|
||
|
|
||
|
1. **TargetProcessControl: Improvements to in-tree support for out-of-process
|
||
|
execution**
|
||
|
|
||
|
The ``TargetProcessControl`` API provides various operations on the JIT
|
||
|
target process (the one which will execute the JIT'd code), including
|
||
|
memory allocation, memory writes, function execution, and process queries
|
||
|
(e.g. for the target triple). By targeting this API new components can be
|
||
|
developed which will work equally well for in-process and out-of-process
|
||
|
JITing.
|
||
|
|
||
|
|
||
|
2. **ORC RPC based TargetProcessControl implementation**
|
||
|
|
||
|
An ORC RPC based implementation of the ``TargetProcessControl`` API is
|
||
|
currently under development to enable easy out-of-process JITing via
|
||
|
file descriptors / sockets.
|
||
|
|
||
|
3. **Core State Machine Cleanup**
|
||
|
|
||
|
The core ORC state machine is currently implemented between JITDylib and
|
||
|
ExecutionSession. Methods are slowly being moved to `ExecutionSession`. This
|
||
|
will tidy up the code base, and also allow us to support asynchronous removal
|
||
|
of JITDylibs (in practice deleting an associated state object in
|
||
|
ExecutionSession and leaving the JITDylib instance in a defunct state until
|
||
|
all references to it have been released).
|
||
|
|
||
|
4. **JITLink improvements**
|
||
|
|
||
|
TBD. We really need a separate JITLink design document.
|
||
|
|
||
|
Near Future Work
|
||
|
----------------
|
||
|
|
||
|
1. **ORC JIT Runtime Libraries**
|
||
|
|
||
|
We need a runtime library for JIT'd code. This would include things like
|
||
|
TLS registration, reentry functions, registration code for language runtimes
|
||
|
(e.g. Objective C and Swift) and other JIT specific runtime code. This should
|
||
|
be built in a similar manner to compiler-rt (possibly even as part of it).
|
||
|
|
||
|
2. **Remote jit_dlopen / jit_dlclose**
|
||
|
|
||
|
To more fully mimic the environment that static programs operate in we would
|
||
|
like JIT'd code to be able to "dlopen" and "dlclose" JITDylibs, running all of
|
||
|
their initializers/deinitializers on the current thread. This would require
|
||
|
support from the runtime library described above.
|
||
|
|
||
|
3. **Debugging support**
|
||
|
|
||
|
ORC currently supports the GDBRegistrationListener API when using RuntimeDyld
|
||
|
as the underlying JIT linker. We will need a new solution for JITLink based
|
||
|
platforms.
|
||
|
|
||
|
Further Future Work
|
||
|
-------------------
|
||
|
|
||
|
1. **Speculative Compilation**
|
||
|
|
||
|
ORC's support for concurrent compilation allows us to easily enable
|
||
|
*speculative* JIT compilation: compilation of code that is not needed yet,
|
||
|
but which we have reason to believe will be needed in the future. This can be
|
||
|
used to hide compile latency and improve JIT throughput. A proof-of-concept
|
||
|
exmaple of speculative compilation with ORC has already been developed (see
|
||
|
``llvm/examples/SpeculativeJIT``). Future work on this is likely to focus on
|
||
|
re-using and improving existing profiling support (currently used by PGO) to
|
||
|
feed speculation decisions, as well as built-in tools to simplify use of
|
||
|
speculative compilation.
|
||
|
|
||
|
.. [1] Formats/architectures vary in terms of supported features. MachO and
|
||
|
ELF tend to have better support than COFF. Patches very welcome!
|
||
|
|
||
|
.. [2] The ``LazyEmittingLayer``, ``RemoteObjectClientLayer`` and
|
||
|
``RemoteObjectServerLayer`` do not have counterparts in the new
|
||
|
system. In the case of ``LazyEmittingLayer`` it was simply no longer
|
||
|
needed: in ORCv2, deferring compilation until symbols are looked up is
|
||
|
the default. The removal of ``RemoteObjectClientLayer`` and
|
||
|
``RemoteObjectServerLayer`` means that JIT stacks can no longer be split
|
||
|
across processes, however this functionality appears not to have been
|
||
|
used.
|
||
|
|
||
|
.. [3] Weak definitions are currently handled correctly within dylibs, but if
|
||
|
multiple dylibs provide a weak definition of a symbol then each will end
|
||
|
up with its own definition (similar to how weak definitions are handled
|
||
|
in Windows DLLs). This will be fixed in the future.
|