 7a42e6885a
			
		
	
	
		7a42e6885a
		
	
	
	
	
		
			
			Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220420132624.2439741-13-marcandre.lureau@redhat.com>
		
			
				
	
	
		
			416 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			416 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _tracing:
 | |
| 
 | |
| =======
 | |
| Tracing
 | |
| =======
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| This document describes the tracing infrastructure in QEMU and how to use it
 | |
| for debugging, profiling, and observing execution.
 | |
| 
 | |
| Quickstart
 | |
| ==========
 | |
| 
 | |
| Enable tracing of ``memory_region_ops_read`` and ``memory_region_ops_write``
 | |
| events::
 | |
| 
 | |
|     $ qemu --trace "memory_region_ops_*" ...
 | |
|     ...
 | |
|     719585@1608130130.441188:memory_region_ops_read cpu 0 mr 0x562fdfbb3820 addr 0x3cc value 0x67 size 1
 | |
|     719585@1608130130.441190:memory_region_ops_write cpu 0 mr 0x562fdfbd2f00 addr 0x3d4 value 0x70e size 2
 | |
| 
 | |
| This output comes from the "log" trace backend that is enabled by default when
 | |
| ``./configure --enable-trace-backends=BACKENDS`` was not explicitly specified.
 | |
| 
 | |
| Multiple patterns can be specified by repeating the ``--trace`` option::
 | |
| 
 | |
|     $ qemu --trace "kvm_*" --trace "virtio_*" ...
 | |
| 
 | |
| When patterns are used frequently it is more convenient to store them in a
 | |
| file to avoid long command-line options::
 | |
| 
 | |
|     $ echo "memory_region_ops_*" >/tmp/events
 | |
|     $ echo "kvm_*" >>/tmp/events
 | |
|     $ qemu --trace events=/tmp/events ...
 | |
| 
 | |
| Trace events
 | |
| ============
 | |
| 
 | |
| Sub-directory setup
 | |
| -------------------
 | |
| 
 | |
| Each directory in the source tree can declare a set of trace events in a local
 | |
| "trace-events" file. All directories which contain "trace-events" files must be
 | |
| listed in the "trace_events_subdirs" variable in the top level meson.build
 | |
| file. During build, the "trace-events" file in each listed subdirectory will be
 | |
| processed by the "tracetool" script to generate code for the trace events.
 | |
| 
 | |
| The individual "trace-events" files are merged into a "trace-events-all" file,
 | |
| which is also installed into "/usr/share/qemu".
 | |
| This merged file is to be used by the "simpletrace.py" script to later analyse
 | |
| traces in the simpletrace data format.
 | |
| 
 | |
| The following files are automatically generated in <builddir>/trace/ during the
 | |
| build:
 | |
| 
 | |
|  - trace-<subdir>.c - the trace event state declarations
 | |
|  - trace-<subdir>.h - the trace event enums and probe functions
 | |
|  - trace-dtrace-<subdir>.h - DTrace event probe specification
 | |
|  - trace-dtrace-<subdir>.dtrace - DTrace event probe helper declaration
 | |
|  - trace-dtrace-<subdir>.o - binary DTrace provider (generated by dtrace)
 | |
|  - trace-ust-<subdir>.h - UST event probe helper declarations
 | |
| 
 | |
| Here <subdir> is the sub-directory path with '/' replaced by '_'. For example,
 | |
| "accel/kvm" becomes "accel_kvm" and the final filename for "trace-<subdir>.c"
 | |
| becomes "trace-accel_kvm.c".
 | |
| 
 | |
| Source files in the source tree do not directly include generated files in
 | |
| "<builddir>/trace/". Instead they #include the local "trace.h" file, without
 | |
| any sub-directory path prefix. eg io/channel-buffer.c would do::
 | |
| 
 | |
|   #include "trace.h"
 | |
| 
 | |
| The "io/trace.h" file must be created manually with an #include of the
 | |
| corresponding "trace/trace-<subdir>.h" file that will be generated in the
 | |
| builddir::
 | |
| 
 | |
|   $ echo '#include "trace/trace-io.h"' >io/trace.h
 | |
| 
 | |
| While it is possible to include a trace.h file from outside a source file's own
 | |
| sub-directory, this is discouraged in general. It is strongly preferred that
 | |
| all events be declared directly in the sub-directory that uses them. The only
 | |
| exception is where there are some shared trace events defined in the top level
 | |
| directory trace-events file.  The top level directory generates trace files
 | |
| with a filename prefix of "trace/trace-root" instead of just "trace". This is
 | |
| to avoid ambiguity between a trace.h in the current directory, vs the top level
 | |
| directory.
 | |
| 
 | |
| Using trace events
 | |
| ------------------
 | |
| 
 | |
| Trace events are invoked directly from source code like this::
 | |
| 
 | |
|     #include "trace.h"  /* needed for trace event prototype */
 | |
|     
 | |
|     void *qemu_vmalloc(size_t size)
 | |
|     {
 | |
|         void *ptr;
 | |
|         size_t align = QEMU_VMALLOC_ALIGN;
 | |
|      
 | |
|         if (size < align) {
 | |
|             align = getpagesize();
 | |
|         }
 | |
|         ptr = qemu_memalign(align, size);
 | |
|         trace_qemu_vmalloc(size, ptr);
 | |
|         return ptr;
 | |
|     }
 | |
| 
 | |
| Declaring trace events
 | |
| ----------------------
 | |
| 
 | |
| The "tracetool" script produces the trace.h header file which is included by
 | |
| every source file that uses trace events.  Since many source files include
 | |
| trace.h, it uses a minimum of types and other header files included to keep the
 | |
| namespace clean and compile times and dependencies down.
 | |
| 
 | |
| Trace events should use types as follows:
 | |
| 
 | |
|  * Use stdint.h types for fixed-size types.  Most offsets and guest memory
 | |
|    addresses are best represented with uint32_t or uint64_t.  Use fixed-size
 | |
|    types over primitive types whose size may change depending on the host
 | |
|    (32-bit versus 64-bit) so trace events don't truncate values or break
 | |
|    the build.
 | |
| 
 | |
|  * Use void * for pointers to structs or for arrays.  The trace.h header
 | |
|    cannot include all user-defined struct declarations and it is therefore
 | |
|    necessary to use void * for pointers to structs.
 | |
| 
 | |
|  * For everything else, use primitive scalar types (char, int, long) with the
 | |
|    appropriate signedness.
 | |
| 
 | |
|  * Avoid floating point types (float and double) because SystemTap does not
 | |
|    support them.  In most cases it is possible to round to an integer type
 | |
|    instead.  This may require scaling the value first by multiplying it by 1000
 | |
|    or the like when digits after the decimal point need to be preserved.
 | |
| 
 | |
| Format strings should reflect the types defined in the trace event.  Take
 | |
| special care to use PRId64 and PRIu64 for int64_t and uint64_t types,
 | |
| respectively.  This ensures portability between 32- and 64-bit platforms.
 | |
| Format strings must not end with a newline character.  It is the responsibility
 | |
| of backends to adapt line ending for proper logging.
 | |
| 
 | |
| Each event declaration will start with the event name, then its arguments,
 | |
| finally a format string for pretty-printing. For example::
 | |
| 
 | |
|     qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p"
 | |
|     qemu_vfree(void *ptr) "ptr %p"
 | |
| 
 | |
| 
 | |
| Hints for adding new trace events
 | |
| ---------------------------------
 | |
| 
 | |
| 1. Trace state changes in the code.  Interesting points in the code usually
 | |
|    involve a state change like starting, stopping, allocating, freeing.  State
 | |
|    changes are good trace events because they can be used to understand the
 | |
|    execution of the system.
 | |
| 
 | |
| 2. Trace guest operations.  Guest I/O accesses like reading device registers
 | |
|    are good trace events because they can be used to understand guest
 | |
|    interactions.
 | |
| 
 | |
| 3. Use correlator fields so the context of an individual line of trace output
 | |
|    can be understood.  For example, trace the pointer returned by malloc and
 | |
|    used as an argument to free.  This way mallocs and frees can be matched up.
 | |
|    Trace events with no context are not very useful.
 | |
| 
 | |
| 4. Name trace events after their function.  If there are multiple trace events
 | |
|    in one function, append a unique distinguisher at the end of the name.
 | |
| 
 | |
| Generic interface and monitor commands
 | |
| ======================================
 | |
| 
 | |
| You can programmatically query and control the state of trace events through a
 | |
| backend-agnostic interface provided by the header "trace/control.h".
 | |
| 
 | |
| Note that some of the backends do not provide an implementation for some parts
 | |
| of this interface, in which case QEMU will just print a warning (please refer to
 | |
| header "trace/control.h" to see which routines are backend-dependent).
 | |
| 
 | |
| The state of events can also be queried and modified through monitor commands:
 | |
| 
 | |
| * ``info trace-events``
 | |
|   View available trace events and their state.  State 1 means enabled, state 0
 | |
|   means disabled.
 | |
| 
 | |
| * ``trace-event NAME on|off``
 | |
|   Enable/disable a given trace event or a group of events (using wildcards).
 | |
| 
 | |
| The "--trace events=<file>" command line argument can be used to enable the
 | |
| events listed in <file> from the very beginning of the program. This file must
 | |
| contain one event name per line.
 | |
| 
 | |
| If a line in the "--trace events=<file>" file begins with a '-', the trace event
 | |
| will be disabled instead of enabled.  This is useful when a wildcard was used
 | |
| to enable an entire family of events but one noisy event needs to be disabled.
 | |
| 
 | |
| Wildcard matching is supported in both the monitor command "trace-event" and the
 | |
| events list file. That means you can enable/disable the events having a common
 | |
| prefix in a batch. For example, virtio-blk trace events could be enabled using
 | |
| the following monitor command::
 | |
| 
 | |
|     trace-event virtio_blk_* on
 | |
| 
 | |
| Trace backends
 | |
| ==============
 | |
| 
 | |
| The "tracetool" script automates tedious trace event code generation and also
 | |
| keeps the trace event declarations independent of the trace backend.  The trace
 | |
| events are not tightly coupled to a specific trace backend, such as LTTng or
 | |
| SystemTap.  Support for trace backends can be added by extending the "tracetool"
 | |
| script.
 | |
| 
 | |
| The trace backends are chosen at configure time::
 | |
| 
 | |
|     ./configure --enable-trace-backends=simple,dtrace
 | |
| 
 | |
| For a list of supported trace backends, try ./configure --help or see below.
 | |
| If multiple backends are enabled, the trace is sent to them all.
 | |
| 
 | |
| If no backends are explicitly selected, configure will default to the
 | |
| "log" backend.
 | |
| 
 | |
| The following subsections describe the supported trace backends.
 | |
| 
 | |
| Nop
 | |
| ---
 | |
| 
 | |
| The "nop" backend generates empty trace event functions so that the compiler
 | |
| can optimize out trace events completely.  This imposes no performance
 | |
| penalty.
 | |
| 
 | |
| Note that regardless of the selected trace backend, events with the "disable"
 | |
| property will be generated with the "nop" backend.
 | |
| 
 | |
| Log
 | |
| ---
 | |
| 
 | |
| The "log" backend sends trace events directly to standard error.  This
 | |
| effectively turns trace events into debug printfs.
 | |
| 
 | |
| This is the simplest backend and can be used together with existing code that
 | |
| uses DPRINTF().
 | |
| 
 | |
| The -msg timestamp=on|off command-line option controls whether or not to print
 | |
| the tid/timestamp prefix for each trace event.
 | |
| 
 | |
| Simpletrace
 | |
| -----------
 | |
| 
 | |
| The "simple" backend writes binary trace logs to a file from a thread, making
 | |
| it lower overhead than the "log" backend. A Python API is available for writing
 | |
| offline trace file analysis scripts. It may not be as powerful as
 | |
| platform-specific or third-party trace backends but it is portable and has no
 | |
| special library dependencies.
 | |
| 
 | |
| Monitor commands
 | |
| ~~~~~~~~~~~~~~~~
 | |
| 
 | |
| * ``trace-file on|off|flush|set <path>``
 | |
|   Enable/disable/flush the trace file or set the trace file name.
 | |
| 
 | |
| Analyzing trace files
 | |
| ~~~~~~~~~~~~~~~~~~~~~
 | |
| 
 | |
| The "simple" backend produces binary trace files that can be formatted with the
 | |
| simpletrace.py script.  The script takes the "trace-events-all" file and the
 | |
| binary trace::
 | |
| 
 | |
|     ./scripts/simpletrace.py trace-events-all trace-12345
 | |
| 
 | |
| You must ensure that the same "trace-events-all" file was used to build QEMU,
 | |
| otherwise trace event declarations may have changed and output will not be
 | |
| consistent.
 | |
| 
 | |
| Ftrace
 | |
| ------
 | |
| 
 | |
| The "ftrace" backend writes trace data to ftrace marker. This effectively
 | |
| sends trace events to ftrace ring buffer, and you can compare qemu trace
 | |
| data and kernel(especially kvm.ko when using KVM) trace data.
 | |
| 
 | |
| if you use KVM, enable kvm events in ftrace::
 | |
| 
 | |
|    # echo 1 > /sys/kernel/debug/tracing/events/kvm/enable
 | |
| 
 | |
| After running qemu by root user, you can get the trace::
 | |
| 
 | |
|    # cat /sys/kernel/debug/tracing/trace
 | |
| 
 | |
| Restriction: "ftrace" backend is restricted to Linux only.
 | |
| 
 | |
| Syslog
 | |
| ------
 | |
| 
 | |
| The "syslog" backend sends trace events using the POSIX syslog API. The log
 | |
| is opened specifying the LOG_DAEMON facility and LOG_PID option (so events
 | |
| are tagged with the pid of the particular QEMU process that generated
 | |
| them). All events are logged at LOG_INFO level.
 | |
| 
 | |
| NOTE: syslog may squash duplicate consecutive trace events and apply rate
 | |
|       limiting.
 | |
| 
 | |
| Restriction: "syslog" backend is restricted to POSIX compliant OS.
 | |
| 
 | |
| LTTng Userspace Tracer
 | |
| ----------------------
 | |
| 
 | |
| The "ust" backend uses the LTTng Userspace Tracer library.  There are no
 | |
| monitor commands built into QEMU, instead UST utilities should be used to list,
 | |
| enable/disable, and dump traces.
 | |
| 
 | |
| Package lttng-tools is required for userspace tracing. You must ensure that the
 | |
| current user belongs to the "tracing" group, or manually launch the
 | |
| lttng-sessiond daemon for the current user prior to running any instance of
 | |
| QEMU.
 | |
| 
 | |
| While running an instrumented QEMU, LTTng should be able to list all available
 | |
| events::
 | |
| 
 | |
|     lttng list -u
 | |
| 
 | |
| Create tracing session::
 | |
| 
 | |
|     lttng create mysession
 | |
| 
 | |
| Enable events::
 | |
| 
 | |
|     lttng enable-event qemu:g_malloc -u
 | |
| 
 | |
| Where the events can either be a comma-separated list of events, or "-a" to
 | |
| enable all tracepoint events. Start and stop tracing as needed::
 | |
| 
 | |
|     lttng start
 | |
|     lttng stop
 | |
| 
 | |
| View the trace::
 | |
| 
 | |
|     lttng view
 | |
| 
 | |
| Destroy tracing session::
 | |
| 
 | |
|     lttng destroy
 | |
| 
 | |
| Babeltrace can be used at any later time to view the trace::
 | |
| 
 | |
|     babeltrace $HOME/lttng-traces/mysession-<date>-<time>
 | |
| 
 | |
| SystemTap
 | |
| ---------
 | |
| 
 | |
| The "dtrace" backend uses DTrace sdt probes but has only been tested with
 | |
| SystemTap.  When SystemTap support is detected a .stp file with wrapper probes
 | |
| is generated to make use in scripts more convenient.  This step can also be
 | |
| performed manually after a build in order to change the binary name in the .stp
 | |
| probes::
 | |
| 
 | |
|     scripts/tracetool.py --backends=dtrace --format=stap \
 | |
|                          --binary path/to/qemu-binary \
 | |
|                          --target-type system \
 | |
|                          --target-name x86_64 \
 | |
|                          --group=all \
 | |
|                          trace-events-all \
 | |
|                          qemu.stp
 | |
| 
 | |
| To facilitate simple usage of systemtap where there merely needs to be printf
 | |
| logging of certain probes, a helper script "qemu-trace-stap" is provided.
 | |
| Consult its manual page for guidance on its usage.
 | |
| 
 | |
| Trace event properties
 | |
| ======================
 | |
| 
 | |
| Each event in the "trace-events-all" file can be prefixed with a space-separated
 | |
| list of zero or more of the following event properties.
 | |
| 
 | |
| "disable"
 | |
| ---------
 | |
| 
 | |
| If a specific trace event is going to be invoked a huge number of times, this
 | |
| might have a noticeable performance impact even when the event is
 | |
| programmatically disabled.
 | |
| 
 | |
| In this case you should declare such event with the "disable" property. This
 | |
| will effectively disable the event at compile time (by using the "nop" backend),
 | |
| thus having no performance impact at all on regular builds (i.e., unless you
 | |
| edit the "trace-events-all" file).
 | |
| 
 | |
| In addition, there might be cases where relatively complex computations must be
 | |
| performed to generate values that are only used as arguments for a trace
 | |
| function. In these cases you can use 'trace_event_get_state_backends()' to
 | |
| guard such computations, so they are skipped if the event has been either
 | |
| compile-time disabled or run-time disabled. If the event is compile-time
 | |
| disabled, this check will have no performance impact.
 | |
| 
 | |
| ::
 | |
| 
 | |
|     #include "trace.h"  /* needed for trace event prototype */
 | |
|     
 | |
|     void *qemu_vmalloc(size_t size)
 | |
|     {
 | |
|         void *ptr;
 | |
|         size_t align = QEMU_VMALLOC_ALIGN;
 | |
|     
 | |
|         if (size < align) {
 | |
|             align = getpagesize();
 | |
|         }
 | |
|         ptr = qemu_memalign(align, size);
 | |
|         if (trace_event_get_state_backends(TRACE_QEMU_VMALLOC)) {
 | |
|             void *complex;
 | |
|             /* some complex computations to produce the 'complex' value */
 | |
|             trace_qemu_vmalloc(size, ptr, complex);
 | |
|         }
 | |
|         return ptr;
 | |
|     }
 | |
| 
 |