 2f2e036ca6
			
		
	
	
		2f2e036ca6
		
	
	
	
	
		
			
			Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Message-Id: <20201023150746.107063-13-alxndr@bu.edu> Signed-off-by: Thomas Huth <thuth@redhat.com>
		
			
				
	
	
		
			215 lines
		
	
	
		
			9.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			215 lines
		
	
	
		
			9.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| = Fuzzing =
 | |
| 
 | |
| == Introduction ==
 | |
| 
 | |
| This document describes the virtual-device fuzzing infrastructure in QEMU and
 | |
| how to use it to implement additional fuzzers.
 | |
| 
 | |
| == Basics ==
 | |
| 
 | |
| Fuzzing operates by passing inputs to an entry point/target function. The
 | |
| fuzzer tracks the code coverage triggered by the input. Based on these
 | |
| findings, the fuzzer mutates the input and repeats the fuzzing.
 | |
| 
 | |
| To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
 | |
| is an _in-process_ fuzzer. For the developer, this means that it is their
 | |
| responsibility to ensure that state is reset between fuzzing-runs.
 | |
| 
 | |
| == Building the fuzzers ==
 | |
| 
 | |
| NOTE: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
 | |
| much faster, since the page-map has a smaller size. This is due to the fact that
 | |
| AddressSanitizer mmaps ~20TB of memory, as part of its detection. This results
 | |
| in a large page-map, and a much slower fork().
 | |
| 
 | |
| To build the fuzzers, install a recent version of clang:
 | |
| Configure with (substitute the clang binaries with the version you installed).
 | |
| Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
 | |
| such as out-of-bounds accesses, use-after-frees, double-frees etc.
 | |
| 
 | |
|     CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \
 | |
|                                                 --enable-sanitizers
 | |
| 
 | |
| Fuzz targets are built similarly to system/softmmu:
 | |
| 
 | |
|     make i386-softmmu/fuzz
 | |
| 
 | |
| This builds ./i386-softmmu/qemu-fuzz-i386
 | |
| 
 | |
| The first option to this command is: --fuzz-target=FUZZ_NAME
 | |
| To list all of the available fuzzers run qemu-fuzz-i386 with no arguments.
 | |
| 
 | |
| For example:
 | |
|     ./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
 | |
| 
 | |
| Internally, libfuzzer parses all arguments that do not begin with "--".
 | |
| Information about these is available by passing -help=1
 | |
| 
 | |
| Now the only thing left to do is wait for the fuzzer to trigger potential
 | |
| crashes.
 | |
| 
 | |
| == Useful libFuzzer flags ==
 | |
| 
 | |
| As mentioned above, libFuzzer accepts some arguments. Passing -help=1 will list
 | |
| the available arguments. In particular, these arguments might be helpful:
 | |
| 
 | |
| $CORPUS_DIR/ : Specify a directory as the last argument to libFuzzer. libFuzzer
 | |
| stores each "interesting" input in this corpus directory. The next time you run
 | |
| libFuzzer, it will read all of the inputs from the corpus, and continue fuzzing
 | |
| from there. You can also specify multiple directories. libFuzzer loads existing
 | |
| inputs from all specified directories, but will only write new ones to the
 | |
| first one specified.
 | |
| 
 | |
| -max_len=4096 : specify the maximum byte-length of the inputs libFuzzer will
 | |
| generate.
 | |
| 
 | |
| -close_fd_mask={1,2,3} : close, stderr, or both. Useful for targets that
 | |
| trigger many debug/error messages, or create output on the serial console.
 | |
| 
 | |
| -jobs=4 -workers=4 : These arguments configure libFuzzer to run 4 fuzzers in
 | |
| parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
 | |
| -jobs=N, libFuzzer automatically spawns a number of workers less than or equal
 | |
| to half the available CPU cores. Replace 4 with a number appropriate for your
 | |
| machine. Make sure to specify a $CORPUS_DIR, which will allow the parallel
 | |
| fuzzers to share information about the interesting inputs they find.
 | |
| 
 | |
| -use_value_profile=1 : For each comparison operation, libFuzzer computes 
 | |
| (caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12) and places this in the coverage
 | |
| table. Useful for targets with "magic" constants. If Arg1 came from the fuzzer's
 | |
| input and Arg2 is a magic constant, then each time the Hamming distance
 | |
| between Arg1 and Arg2 decreases, libFuzzer adds the input to the corpus.
 | |
| 
 | |
| -shrink=1 : Tries to make elements of the corpus "smaller". Might lead to
 | |
| better coverage performance, depending on the target.
 | |
| 
 | |
| Note that libFuzzer's exact behavior will depend on the version of
 | |
| clang and libFuzzer used to build the device fuzzers.
 | |
| 
 | |
| == Generating Coverage Reports ==
 | |
| Code coverage is a crucial metric for evaluating a fuzzer's performance.
 | |
| libFuzzer's output provides a "cov: " column that provides a total number of
 | |
| unique blocks/edges covered. To examine coverage on a line-by-line basis we
 | |
| can use Clang coverage:
 | |
| 
 | |
|  1. Configure libFuzzer to store a corpus of all interesting inputs (see
 | |
|     CORPUS_DIR above)
 | |
|  2. ./configure the QEMU build with:
 | |
|     --enable-fuzzing \
 | |
|     --extra-cflags="-fprofile-instr-generate -fcoverage-mapping"
 | |
|  3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer
 | |
|     to execute all of the inputs in $CORPUS_DIR and exit. Once the process
 | |
|     exits, you should find a file, "default.profraw" in the working directory.
 | |
|  4. Execute these commands to generate a detailed HTML coverage-report:
 | |
|  llvm-profdata merge -output=default.profdata default.profraw
 | |
|  llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \
 | |
|  --format html -output-dir=/path/to/output/report
 | |
| 
 | |
| == Adding a new fuzzer ==
 | |
| Coverage over virtual devices can be improved by adding additional fuzzers.
 | |
| Fuzzers are kept in tests/qtest/fuzz/ and should be added to
 | |
| tests/qtest/fuzz/Makefile.include
 | |
| 
 | |
| Fuzzers can rely on both qtest and libqos to communicate with virtual devices.
 | |
| 
 | |
| 1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``.
 | |
| 
 | |
| 2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers
 | |
| for reference.
 | |
| 
 | |
| 3. Register the fuzzer in ``tests/fuzz/Makefile.include`` by appending the
 | |
| corresponding object to fuzz-obj-y
 | |
| 
 | |
| Fuzzers can be more-or-less thought of as special qtest programs which can
 | |
| modify the qtest commands and/or qtest command arguments based on inputs
 | |
| provided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the
 | |
| fuzzer loops over the byte-array interpreting it as a list of qtest commands,
 | |
| addresses, or values.
 | |
| 
 | |
| == The Generic Fuzzer ==
 | |
| Writing a fuzz target can be a lot of effort (especially if a device driver has
 | |
| not be built-out within libqos). Many devices can be fuzzed to some degree,
 | |
| without any device-specific code, using the generic-fuzz target.
 | |
| 
 | |
| The generic-fuzz target is capable of fuzzing devices over their PIO, MMIO,
 | |
| and DMA input-spaces. To apply the generic-fuzz to a device, we need to define
 | |
| two env-variables, at minimum:
 | |
| 
 | |
| QEMU_FUZZ_ARGS= is the set of QEMU arguments used to configure a machine, with
 | |
| the device attached. For example, if we want to fuzz the virtio-net device
 | |
| attached to a pc-i440fx machine, we can specify:
 | |
| QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \
 | |
|                 -device virtio-net,netdev=user0"
 | |
| 
 | |
| QEMU_FUZZ_OBJECTS= is a set of space-delimited strings used to identify the
 | |
| MemoryRegions that will be fuzzed. These strings are compared against
 | |
| MemoryRegion names and MemoryRegion owner names, to decide whether each
 | |
| MemoryRegion should be fuzzed. These strings support globbing. For the
 | |
| virtio-net example, we could use QEMU_FUZZ_OBJECTS=
 | |
|  * 'virtio-net'
 | |
|  * 'virtio*'
 | |
|  * 'virtio* pcspk' (Fuzz the virtio devices and the PC speaker...)
 | |
|  * '*' (Fuzz the whole machine)
 | |
| 
 | |
| The "info mtree" and "info qom-tree" monitor commands can be especially useful
 | |
| for identifying the MemoryRegion and Object names used for matching.
 | |
| 
 | |
| As a generic rule-of-thumb, the more MemoryRegions/Devices we match, the greater
 | |
| the input-space, and the smaller the probability of finding crashing inputs for
 | |
| individual devices. As such, it is usually a good idea to limit the fuzzer to
 | |
| only a few MemoryRegions.
 | |
| 
 | |
| To ensure that these env variables have been configured correctly, we can use:
 | |
| 
 | |
| ./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0
 | |
| 
 | |
| The output should contain a complete list of matched MemoryRegions.
 | |
| 
 | |
| = Implementation Details =
 | |
| 
 | |
| == The Fuzzer's Lifecycle ==
 | |
| 
 | |
| The fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's
 | |
| own main(), which performs some setup, and calls the entrypoints:
 | |
| 
 | |
| LLVMFuzzerInitialize: called prior to fuzzing. Used to initialize all of the
 | |
| necessary state
 | |
| 
 | |
| LLVMFuzzerTestOneInput: called for each fuzzing run. Processes the input and
 | |
| resets the state at the end of each run.
 | |
| 
 | |
| In more detail:
 | |
| 
 | |
| LLVMFuzzerInitialize parses the arguments to the fuzzer (must start with two
 | |
| dashes, so they are ignored by libfuzzer main()). Currently, the arguments
 | |
| select the fuzz target. Then, the qtest client is initialized. If the target
 | |
| requires qos, qgraph is set up and the QOM/LIBQOS modules are initialized.
 | |
| Then the QGraph is walked and the QEMU cmd_line is determined and saved.
 | |
| 
 | |
| After this, the vl.c:qemu__main is called to set up the guest. There are
 | |
| target-specific hooks that can be called before and after qemu_main, for
 | |
| additional setup(e.g. PCI setup, or VM snapshotting).
 | |
| 
 | |
| LLVMFuzzerTestOneInput: Uses qtest/qos functions to act based on the fuzz
 | |
| input. It is also responsible for manually calling the main loop/main_loop_wait
 | |
| to ensure that bottom halves are executed and any cleanup required before the
 | |
| next input.
 | |
| 
 | |
| Since the same process is reused for many fuzzing runs, QEMU state needs to
 | |
| be reset at the end of each run. There are currently two implemented
 | |
| options for resetting state:
 | |
| 1. Reboot the guest between runs.
 | |
|    Pros: Straightforward and fast for simple fuzz targets.
 | |
|    Cons: Depending on the device, does not reset all device state. If the
 | |
|    device requires some initialization prior to being ready for fuzzing
 | |
|    (common for QOS-based targets), this initialization needs to be done after
 | |
|    each reboot.
 | |
|    Example target: i440fx-qtest-reboot-fuzz
 | |
| 2. Run each test case in a separate forked process and copy the coverage
 | |
|    information back to the parent. This is fairly similar to AFL's "deferred"
 | |
|    fork-server mode [3]
 | |
|    Pros: Relatively fast. Devices only need to be initialized once. No need
 | |
|    to do slow reboots or vmloads.
 | |
|    Cons: Not officially supported by libfuzzer. Does not work well for devices
 | |
|    that rely on dedicated threads.
 | |
|    Example target: virtio-net-fork-fuzz
 |