248 lines
9.1 KiB
ReStructuredText
248 lines
9.1 KiB
ReStructuredText
=======
|
|
ThinLTO
|
|
=======
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
Introduction
|
|
============
|
|
|
|
*ThinLTO* compilation is a new type of LTO that is both scalable and
|
|
incremental. *LTO* (Link Time Optimization) achieves better
|
|
runtime performance through whole-program analysis and cross-module
|
|
optimization. However, monolithic LTO implements this by merging all
|
|
input into a single module, which is not scalable
|
|
in time or memory, and also prevents fast incremental compiles.
|
|
|
|
In ThinLTO mode, as with regular LTO, clang emits LLVM bitcode after the
|
|
compile phase. The ThinLTO bitcode is augmented with a compact summary
|
|
of the module. During the link step, only the summaries are read and
|
|
merged into a combined summary index, which includes an index of function
|
|
locations for later cross-module function importing. Fast and efficient
|
|
whole-program analysis is then performed on the combined summary index.
|
|
|
|
However, all transformations, including function importing, occur
|
|
later when the modules are optimized in fully parallel backends.
|
|
By default, linkers_ that support ThinLTO are set up to launch
|
|
the ThinLTO backends in threads. So the usage model is not affected
|
|
as the distinction between the fast serial thin link step and the backends
|
|
is transparent to the user.
|
|
|
|
For more information on the ThinLTO design and current performance,
|
|
see the LLVM blog post `ThinLTO: Scalable and Incremental LTO
|
|
<http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_.
|
|
While tuning is still in progress, results in the blog post show that
|
|
ThinLTO already performs well compared to LTO, in many cases matching
|
|
the performance improvement.
|
|
|
|
Current Status
|
|
==============
|
|
|
|
Clang/LLVM
|
|
----------
|
|
.. _compiler:
|
|
|
|
The 3.9 release of clang includes ThinLTO support. However, ThinLTO
|
|
is under active development, and new features, improvements and bugfixes
|
|
are being added for the next release. For the latest ThinLTO support,
|
|
`build a recent version of clang and LLVM
|
|
<https://llvm.org/docs/CMake.html>`_.
|
|
|
|
Linkers
|
|
-------
|
|
.. _linkers:
|
|
.. _linker:
|
|
|
|
ThinLTO is currently supported for the following linkers:
|
|
|
|
- **gold (via the gold-plugin)**:
|
|
Similar to monolithic LTO, this requires using
|
|
a `gold linker configured with plugins enabled
|
|
<https://llvm.org/docs/GoldPlugin.html>`_.
|
|
- **ld64**:
|
|
Starting with `Xcode 8 <https://developer.apple.com/xcode/>`_.
|
|
- **lld**:
|
|
Starting with r284050 for ELF, r298942 for COFF.
|
|
|
|
Usage
|
|
=====
|
|
|
|
Basic
|
|
-----
|
|
|
|
To utilize ThinLTO, simply add the -flto=thin option to compile and link. E.g.
|
|
|
|
.. code-block:: console
|
|
|
|
% clang -flto=thin -O2 file1.c file2.c -c
|
|
% clang -flto=thin -O2 file1.o file2.o -o a.out
|
|
|
|
When using lld-link, the -flto option need only be added to the compile step:
|
|
|
|
.. code-block:: console
|
|
|
|
% clang-cl -flto=thin -O2 -c file1.c file2.c
|
|
% lld-link /out:a.exe file1.obj file2.obj
|
|
|
|
As mentioned earlier, by default the linkers will launch the ThinLTO backend
|
|
threads in parallel, passing the resulting native object files back to the
|
|
linker for the final native link. As such, the usage model the same as
|
|
non-LTO.
|
|
|
|
With gold, if you see an error during the link of the form:
|
|
|
|
.. code-block:: console
|
|
|
|
/usr/bin/ld: error: /path/to/clang/bin/../lib/LLVMgold.so: could not load plugin library: /path/to/clang/bin/../lib/LLVMgold.so: cannot open shared object file: No such file or directory
|
|
|
|
Then either gold was not configured with plugins enabled, or clang
|
|
was not built with ``-DLLVM_BINUTILS_INCDIR`` set properly. See
|
|
the instructions for the
|
|
`LLVM gold plugin <https://llvm.org/docs/GoldPlugin.html#how-to-build-it>`_.
|
|
|
|
Controlling Backend Parallelism
|
|
-------------------------------
|
|
.. _parallelism:
|
|
|
|
By default, the ThinLTO link step will launch as many
|
|
threads in parallel as there are cores. If the number of
|
|
cores can't be computed for the architecture, then it will launch
|
|
``std::thread::hardware_concurrency`` number of threads in parallel.
|
|
For machines with hyper-threading, this is the total number of
|
|
virtual cores. For some applications and machine configurations this
|
|
may be too aggressive, in which case the amount of parallelism can
|
|
be reduced to ``N`` via:
|
|
|
|
- gold:
|
|
``-Wl,-plugin-opt,jobs=N``
|
|
- ld64:
|
|
``-Wl,-mllvm,-threads=N``
|
|
- lld:
|
|
``-Wl,--thinlto-jobs=N``
|
|
- lld-link:
|
|
``/opt:lldltojobs=N``
|
|
|
|
Other possible values for ``N`` are:
|
|
|
|
- 0:
|
|
Use one thread per physical core (default)
|
|
- 1:
|
|
Use a single thread only (disable multi-threading)
|
|
- all:
|
|
Use one thread per logical core (uses all hyper-threads)
|
|
|
|
Incremental
|
|
-----------
|
|
.. _incremental:
|
|
|
|
ThinLTO supports fast incremental builds through the use of a cache,
|
|
which currently must be enabled through a linker option.
|
|
|
|
- gold (as of LLVM 4.0):
|
|
``-Wl,-plugin-opt,cache-dir=/path/to/cache``
|
|
- ld64 (support in clang 3.9 and Xcode 8):
|
|
``-Wl,-cache_path_lto,/path/to/cache``
|
|
- ELF lld (as of LLVM 5.0):
|
|
``-Wl,--thinlto-cache-dir=/path/to/cache``
|
|
- COFF lld-link (as of LLVM 6.0):
|
|
``/lldltocache:/path/to/cache``
|
|
|
|
Cache Pruning
|
|
-------------
|
|
|
|
To help keep the size of the cache under control, ThinLTO supports cache
|
|
pruning. Cache pruning is supported with gold, ld64 and ELF and COFF lld, but
|
|
currently only gold, ELF and COFF lld allow you to control the policy with a
|
|
policy string. The cache policy must be specified with a linker option.
|
|
|
|
- gold (as of LLVM 6.0):
|
|
``-Wl,-plugin-opt,cache-policy=POLICY``
|
|
- ELF lld (as of LLVM 5.0):
|
|
``-Wl,--thinlto-cache-policy,POLICY``
|
|
- COFF lld-link (as of LLVM 6.0):
|
|
``/lldltocachepolicy:POLICY``
|
|
|
|
A policy string is a series of key-value pairs separated by ``:`` characters.
|
|
Possible key-value pairs are:
|
|
|
|
- ``cache_size=X%``: The maximum size for the cache directory is ``X`` percent
|
|
of the available space on the disk. Set to 100 to indicate no limit,
|
|
50 to indicate that the cache size will not be left over half the available
|
|
disk space. A value over 100 is invalid. A value of 0 disables the percentage
|
|
size-based pruning. The default is 75%.
|
|
|
|
- ``cache_size_bytes=X``, ``cache_size_bytes=Xk``, ``cache_size_bytes=Xm``,
|
|
``cache_size_bytes=Xg``:
|
|
Sets the maximum size for the cache directory to ``X`` bytes (or KB, MB,
|
|
GB respectively). A value over the amount of available space on the disk
|
|
will be reduced to the amount of available space. A value of 0 disables
|
|
the byte size-based pruning. The default is no byte size-based pruning.
|
|
|
|
Note that ThinLTO will apply both size-based pruning policies simultaneously,
|
|
and changing one does not affect the other. For example, a policy of
|
|
``cache_size_bytes=1g`` on its own will cause both the 1GB and default 75%
|
|
policies to be applied unless the default ``cache_size`` is overridden.
|
|
|
|
- ``cache_size_files=X``:
|
|
Set the maximum number of files in the cache directory. Set to 0 to indicate
|
|
no limit. The default is 1000000 files.
|
|
|
|
- ``prune_after=Xs``, ``prune_after=Xm``, ``prune_after=Xh``: Sets the
|
|
expiration time for cache files to ``X`` seconds (or minutes, hours
|
|
respectively). When a file hasn't been accessed for ``prune_after`` seconds,
|
|
it is removed from the cache. A value of 0 disables the expiration-based
|
|
pruning. The default is 1 week.
|
|
|
|
- ``prune_interval=Xs``, ``prune_interval=Xm``, ``prune_interval=Xh``:
|
|
Sets the pruning interval to ``X`` seconds (or minutes, hours
|
|
respectively). This is intended to be used to avoid scanning the directory
|
|
too often. It does not impact the decision of which files to prune. A
|
|
value of 0 forces the scan to occur. The default is every 20 minutes.
|
|
|
|
Clang Bootstrap
|
|
---------------
|
|
|
|
To `bootstrap clang/LLVM <https://llvm.org/docs/AdvancedBuilds.html#bootstrap-builds>`_
|
|
with ThinLTO, follow these steps:
|
|
|
|
1. The host compiler_ must be a version of clang that supports ThinLTO.
|
|
#. The host linker_ must support ThinLTO (and in the case of gold, must be
|
|
`configured with plugins enabled <https://llvm.org/docs/GoldPlugin.html>`_).
|
|
#. Use the following additional `CMake variables
|
|
<https://llvm.org/docs/CMake.html#options-and-variables>`_
|
|
when configuring the bootstrap compiler build:
|
|
|
|
* ``-DLLVM_ENABLE_LTO=Thin``
|
|
* ``-DCMAKE_C_COMPILER=/path/to/host/clang``
|
|
* ``-DCMAKE_CXX_COMPILER=/path/to/host/clang++``
|
|
* ``-DCMAKE_RANLIB=/path/to/host/llvm-ranlib``
|
|
* ``-DCMAKE_AR=/path/to/host/llvm-ar``
|
|
|
|
Or, on Windows:
|
|
|
|
* ``-DLLVM_ENABLE_LTO=Thin``
|
|
* ``-DCMAKE_C_COMPILER=/path/to/host/clang-cl.exe``
|
|
* ``-DCMAKE_CXX_COMPILER=/path/to/host/clang-cl.exe``
|
|
* ``-DCMAKE_LINKER=/path/to/host/lld-link.exe``
|
|
* ``-DCMAKE_RANLIB=/path/to/host/llvm-ranlib.exe``
|
|
* ``-DCMAKE_AR=/path/to/host/llvm-ar.exe``
|
|
|
|
#. To use additional linker arguments for controlling the backend
|
|
parallelism_ or enabling incremental_ builds of the bootstrap compiler,
|
|
after configuring the build, modify the resulting CMakeCache.txt file in the
|
|
build directory. Specify any additional linker options after
|
|
``CMAKE_EXE_LINKER_FLAGS:STRING=``. Note the configure may fail if
|
|
linker plugin options are instead specified directly in the previous step.
|
|
|
|
The ``BOOTSTRAP_LLVM_ENABLE_LTO=Thin`` will enable ThinLTO for stage 2 and
|
|
stage 3 in case the compiler used for stage 1 does not support the ThinLTO
|
|
option.
|
|
|
|
More Information
|
|
================
|
|
|
|
* From LLVM project blog:
|
|
`ThinLTO: Scalable and Incremental LTO
|
|
<http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_
|