172 lines
7.3 KiB
ReStructuredText
172 lines
7.3 KiB
ReStructuredText
|
=============================================================
|
||
|
How To Build Clang and LLVM with Profile-Guided Optimizations
|
||
|
=============================================================
|
||
|
|
||
|
Introduction
|
||
|
============
|
||
|
|
||
|
PGO (Profile-Guided Optimization) allows your compiler to better optimize code
|
||
|
for how it actually runs. Users report that applying this to Clang and LLVM can
|
||
|
decrease overall compile time by 20%.
|
||
|
|
||
|
This guide walks you through how to build Clang with PGO, though it also applies
|
||
|
to other subprojects, such as LLD.
|
||
|
|
||
|
If you want to build other software with PGO, see the `end-user documentation
|
||
|
for PGO <https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization>`_.
|
||
|
|
||
|
|
||
|
Using preconfigured CMake caches
|
||
|
================================
|
||
|
|
||
|
See https://llvm.org/docs/AdvancedBuilds.html#multi-stage-pgo
|
||
|
|
||
|
Using the script
|
||
|
================
|
||
|
|
||
|
We have a script at ``utils/collect_and_build_with_pgo.py``. This script is
|
||
|
tested on a few Linux flavors, and requires a checkout of LLVM, Clang, and
|
||
|
compiler-rt. Despite the name, it performs four clean builds of Clang, so it
|
||
|
can take a while to run to completion. Please see the script's ``--help`` for
|
||
|
more information on how to run it, and the different options available to you.
|
||
|
If you want to get the most out of PGO for a particular use-case (e.g. compiling
|
||
|
a specific large piece of software), please do read the section below on
|
||
|
'benchmark' selection.
|
||
|
|
||
|
Please note that this script is only tested on a few Linux distros. Patches to
|
||
|
add support for other platforms, as always, are highly appreciated. :)
|
||
|
|
||
|
This script also supports a ``--dry-run`` option, which causes it to print
|
||
|
important commands instead of running them.
|
||
|
|
||
|
|
||
|
Selecting 'benchmarks'
|
||
|
======================
|
||
|
|
||
|
PGO does best when the profiles gathered represent how the user plans to use the
|
||
|
compiler. Notably, highly accurate profiles of llc building x86_64 code aren't
|
||
|
incredibly helpful if you're going to be targeting ARM.
|
||
|
|
||
|
By default, the script above does two things to get solid coverage. It:
|
||
|
|
||
|
- runs all of Clang and LLVM's lit tests, and
|
||
|
- uses the instrumented Clang to build Clang, LLVM, and all of the other
|
||
|
LLVM subprojects available to it.
|
||
|
|
||
|
Together, these should give you:
|
||
|
|
||
|
- solid coverage of building C++,
|
||
|
- good coverage of building C,
|
||
|
- great coverage of running optimizations,
|
||
|
- great coverage of the backend for your host's architecture, and
|
||
|
- some coverage of other architectures (if other arches are supported backends).
|
||
|
|
||
|
Altogether, this should cover a diverse set of uses for Clang and LLVM. If you
|
||
|
have very specific needs (e.g. your compiler is meant to compile a large browser
|
||
|
for four different platforms, or similar), you may want to do something else.
|
||
|
This is configurable in the script itself.
|
||
|
|
||
|
|
||
|
Building Clang with PGO
|
||
|
=======================
|
||
|
|
||
|
If you prefer to not use the script or the cmake cache, this briefly goes over
|
||
|
how to build Clang/LLVM with PGO.
|
||
|
|
||
|
First, you should have at least LLVM, Clang, and compiler-rt checked out
|
||
|
locally.
|
||
|
|
||
|
Next, at a high level, you're going to need to do the following:
|
||
|
|
||
|
1. Build a standard Release Clang and the relevant libclang_rt.profile library
|
||
|
2. Build Clang using the Clang you built above, but with instrumentation
|
||
|
3. Use the instrumented Clang to generate profiles, which consists of two steps:
|
||
|
|
||
|
- Running the instrumented Clang/LLVM/lld/etc. on tasks that represent how
|
||
|
users will use said tools.
|
||
|
- Using a tool to convert the "raw" profiles generated above into a single,
|
||
|
final PGO profile.
|
||
|
|
||
|
4. Build a final release Clang (along with whatever other binaries you need)
|
||
|
using the profile collected from your benchmark
|
||
|
|
||
|
In more detailed steps:
|
||
|
|
||
|
1. Configure a Clang build as you normally would. It's highly recommended that
|
||
|
you use the Release configuration for this, since it will be used to build
|
||
|
another Clang. Because you need Clang and supporting libraries, you'll want
|
||
|
to build the ``all`` target (e.g. ``ninja all`` or ``make -j4 all``).
|
||
|
|
||
|
2. Configure a Clang build as above, but add the following CMake args:
|
||
|
|
||
|
- ``-DLLVM_BUILD_INSTRUMENTED=IR`` -- This causes us to build everything
|
||
|
with instrumentation.
|
||
|
- ``-DLLVM_BUILD_RUNTIME=No`` -- A few projects have bad interactions when
|
||
|
built with profiling, and aren't necessary to build. This flag turns them
|
||
|
off.
|
||
|
- ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
|
||
|
step 1.
|
||
|
- ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.
|
||
|
|
||
|
In this build directory, you simply need to build the ``clang`` target (and
|
||
|
whatever supporting tooling your benchmark requires).
|
||
|
|
||
|
3. As mentioned above, this has two steps: gathering profile data, and then
|
||
|
massaging it into a useful form:
|
||
|
|
||
|
a. Build your benchmark using the Clang generated in step 2. The 'standard'
|
||
|
benchmark recommended is to run ``check-clang`` and ``check-llvm`` in your
|
||
|
instrumented Clang's build directory, and to do a full build of Clang/LLVM
|
||
|
using your instrumented Clang. So, create yet another build directory,
|
||
|
with the following CMake arguments:
|
||
|
|
||
|
- ``-DCMAKE_C_COMPILER=/path/to/stage2/clang`` - Use the Clang we built in
|
||
|
step 2.
|
||
|
- ``-DCMAKE_CXX_COMPILER=/path/to/stage2/clang++`` - Same as above.
|
||
|
|
||
|
If your users are fans of debug info, you may want to consider using
|
||
|
``-DCMAKE_BUILD_TYPE=RelWithDebInfo`` instead of
|
||
|
``-DCMAKE_BUILD_TYPE=Release``. This will grant better coverage of
|
||
|
debug info pieces of clang, but will take longer to complete and will
|
||
|
result in a much larger build directory.
|
||
|
|
||
|
It's recommended to build the ``all`` target with your instrumented Clang,
|
||
|
since more coverage is often better.
|
||
|
|
||
|
b. You should now have a few ``*.profraw`` files in
|
||
|
``path/to/stage2/profiles/``. You need to merge these using
|
||
|
``llvm-profdata`` (even if you only have one! The profile merge transforms
|
||
|
profraw into actual profile data, as well). This can be done with
|
||
|
``/path/to/stage1/llvm-profdata merge
|
||
|
-output=/path/to/output/profdata.prof path/to/stage2/profiles/*.profraw``.
|
||
|
|
||
|
4. Now, build your final, PGO-optimized Clang. To do this, you'll want to pass
|
||
|
the following additional arguments to CMake.
|
||
|
|
||
|
- ``-DLLVM_PROFDATA_FILE=/path/to/output/profdata.prof`` - Use the PGO
|
||
|
profile from the previous step.
|
||
|
- ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
|
||
|
step 1.
|
||
|
- ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.
|
||
|
|
||
|
From here, you can build whatever targets you need.
|
||
|
|
||
|
.. note::
|
||
|
You may see warnings about a mismatched profile in the build output. These
|
||
|
are generally harmless. To silence them, you can add
|
||
|
``-DCMAKE_C_FLAGS='-Wno-backend-plugin'
|
||
|
-DCMAKE_CXX_FLAGS='-Wno-backend-plugin'`` to your CMake invocation.
|
||
|
|
||
|
|
||
|
Congrats! You now have a Clang built with profile-guided optimizations, and you
|
||
|
can delete all but the final build directory if you'd like.
|
||
|
|
||
|
If this worked well for you and you plan on doing it often, there's a slight
|
||
|
optimization that can be made: LLVM and Clang have a tool called tblgen that's
|
||
|
built and run during the build process. While it's potentially nice to build
|
||
|
this for coverage as part of step 3, none of your other builds should benefit
|
||
|
from building it. You can pass the CMake options
|
||
|
``-DCLANG_TABLEGEN=/path/to/stage1/bin/clang-tblgen
|
||
|
-DLLVM_TABLEGEN=/path/to/stage1/bin/llvm-tblgen`` to steps 2 and onward to avoid
|
||
|
these useless rebuilds.
|