183 lines
8.7 KiB
ReStructuredText
183 lines
8.7 KiB
ReStructuredText
|
###############
|
||
|
Timerlat tracer
|
||
|
###############
|
||
|
|
||
|
The timerlat tracer aims to help the preemptive kernel developers to
|
||
|
find sources of wakeup latencies of real-time threads. Like cyclictest,
|
||
|
the tracer sets a periodic timer that wakes up a thread. The thread then
|
||
|
computes a *wakeup latency* value as the difference between the *current
|
||
|
time* and the *absolute time* that the timer was set to expire. The main
|
||
|
goal of timerlat is tracing in such a way to help kernel developers.
|
||
|
|
||
|
Usage
|
||
|
-----
|
||
|
|
||
|
Write the ASCII text "timerlat" into the current_tracer file of the
|
||
|
tracing system (generally mounted at /sys/kernel/tracing).
|
||
|
|
||
|
For example::
|
||
|
|
||
|
[root@f32 ~]# cd /sys/kernel/tracing/
|
||
|
[root@f32 tracing]# echo timerlat > current_tracer
|
||
|
|
||
|
It is possible to follow the trace by reading the trace file::
|
||
|
|
||
|
[root@f32 tracing]# cat trace
|
||
|
# tracer: timerlat
|
||
|
#
|
||
|
# _-----=> irqs-off
|
||
|
# / _----=> need-resched
|
||
|
# | / _---=> hardirq/softirq
|
||
|
# || / _--=> preempt-depth
|
||
|
# || /
|
||
|
# |||| ACTIVATION
|
||
|
# TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
|
||
|
# | | | |||| | | | |
|
||
|
<idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns
|
||
|
<...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns
|
||
|
<idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns
|
||
|
<...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns
|
||
|
<idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns
|
||
|
<...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns
|
||
|
<idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns
|
||
|
<...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
|
||
|
|
||
|
|
||
|
The tracer creates a per-cpu kernel thread with real-time priority that
|
||
|
prints two lines at every activation. The first is the *timer latency*
|
||
|
observed at the *hardirq* context before the activation of the thread.
|
||
|
The second is the *timer latency* observed by the thread. The ACTIVATION
|
||
|
ID field serves to relate the *irq* execution to its respective *thread*
|
||
|
execution.
|
||
|
|
||
|
The *irq*/*thread* splitting is important to clarify in which context
|
||
|
the unexpected high value is coming from. The *irq* context can be
|
||
|
delayed by hardware-related actions, such as SMIs, NMIs, IRQs,
|
||
|
or by thread masking interrupts. Once the timer happens, the delay
|
||
|
can also be influenced by blocking caused by threads. For example, by
|
||
|
postponing the scheduler execution via preempt_disable(), scheduler
|
||
|
execution, or masking interrupts. Threads can also be delayed by the
|
||
|
interference from other threads and IRQs.
|
||
|
|
||
|
Tracer options
|
||
|
---------------------
|
||
|
|
||
|
The timerlat tracer is built on top of osnoise tracer.
|
||
|
So its configuration is also done in the osnoise/ config
|
||
|
directory. The timerlat configs are:
|
||
|
|
||
|
- cpus: CPUs at which a timerlat thread will execute.
|
||
|
- timerlat_period_us: the period of the timerlat thread.
|
||
|
- stop_tracing_us: stop the system tracing if a
|
||
|
timer latency at the *irq* context higher than the configured
|
||
|
value happens. Writing 0 disables this option.
|
||
|
- stop_tracing_total_us: stop the system tracing if a
|
||
|
timer latency at the *thread* context is higher than the configured
|
||
|
value happens. Writing 0 disables this option.
|
||
|
- print_stack: save the stack of the IRQ occurrence. The stack is printed
|
||
|
after the *thread context* event, or at the IRQ handler if *stop_tracing_us*
|
||
|
is hit.
|
||
|
|
||
|
timerlat and osnoise
|
||
|
----------------------------
|
||
|
|
||
|
The timerlat can also take advantage of the osnoise: traceevents.
|
||
|
For example::
|
||
|
|
||
|
[root@f32 ~]# cd /sys/kernel/tracing/
|
||
|
[root@f32 tracing]# echo timerlat > current_tracer
|
||
|
[root@f32 tracing]# echo 1 > events/osnoise/enable
|
||
|
[root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
|
||
|
[root@f32 tracing]# tail -10 trace
|
||
|
cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 13585 ns
|
||
|
cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns
|
||
|
cc1-87882 [005] dNLh2.. 548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns
|
||
|
cc1-87882 [005] d...3.. 548.771102: thread_noise: cc1:87882 start 548.771078243 duration 9909 ns
|
||
|
timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns
|
||
|
|
||
|
In this case, the root cause of the timer latency does not point to a
|
||
|
single cause but to multiple ones. Firstly, the timer IRQ was delayed
|
||
|
for 13 us, which may point to a long IRQ disabled section (see IRQ
|
||
|
stacktrace section). Then the timer interrupt that wakes up the timerlat
|
||
|
thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
|
||
|
the cc1 thread noise took 9909 ns of time before the context switch.
|
||
|
Such pieces of evidence are useful for the developer to use other
|
||
|
tracing methods to figure out how to debug and optimize the system.
|
||
|
|
||
|
It is worth mentioning that the *duration* values reported
|
||
|
by the osnoise: events are *net* values. For example, the
|
||
|
thread_noise does not include the duration of the overhead caused
|
||
|
by the IRQ execution (which indeed accounted for 12736 ns). But
|
||
|
the values reported by the timerlat tracer (timerlat_latency)
|
||
|
are *gross* values.
|
||
|
|
||
|
The art below illustrates a CPU timeline and how the timerlat tracer
|
||
|
observes it at the top and the osnoise: events at the bottom. Each "-"
|
||
|
in the timelines means circa 1 us, and the time moves ==>::
|
||
|
|
||
|
External timer irq thread
|
||
|
clock latency latency
|
||
|
event 13585 ns 39960 ns
|
||
|
| ^ ^
|
||
|
v | |
|
||
|
|-------------| |
|
||
|
|-------------+-------------------------|
|
||
|
^ ^
|
||
|
========================================================================
|
||
|
[tmr irq] [dev irq]
|
||
|
[another thread...^ v..^ v.......][timerlat/ thread] <-- CPU timeline
|
||
|
=========================================================================
|
||
|
|-------| |-------|
|
||
|
|--^ v-------|
|
||
|
| | |
|
||
|
| | + thread_noise: 9909 ns
|
||
|
| +-> irq_noise: 6139 ns
|
||
|
+-> irq_noise: 7597 ns
|
||
|
|
||
|
IRQ stacktrace
|
||
|
---------------------------
|
||
|
|
||
|
The osnoise/print_stack option is helpful for the cases in which a thread
|
||
|
noise causes the major factor for the timer latency, because of preempt or
|
||
|
irq disabled. For example::
|
||
|
|
||
|
[root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us
|
||
|
[root@f32 tracing]# echo 500 > osnoise/print_stack
|
||
|
[root@f32 tracing]# echo timerlat > current_tracer
|
||
|
[root@f32 tracing]# tail -21 per_cpu/cpu7/trace
|
||
|
insmod-1026 [007] dN.h1.. 200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns
|
||
|
insmod-1026 [007] d..h1.. 200.202587: #29800 context irq timer_latency 1616 ns
|
||
|
insmod-1026 [007] dN.h2.. 200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns
|
||
|
insmod-1026 [007] dN.h3.. 200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns
|
||
|
insmod-1026 [007] d...3.. 200.203444: thread_noise: insmod:1026 start 200.202586933 duration 838681 ns
|
||
|
timerlat/7-1001 [007] ....... 200.203445: #29800 context thread timer_latency 859978 ns
|
||
|
timerlat/7-1001 [007] ....1.. 200.203446: <stack trace>
|
||
|
=> timerlat_irq
|
||
|
=> __hrtimer_run_queues
|
||
|
=> hrtimer_interrupt
|
||
|
=> __sysvec_apic_timer_interrupt
|
||
|
=> asm_call_irq_on_stack
|
||
|
=> sysvec_apic_timer_interrupt
|
||
|
=> asm_sysvec_apic_timer_interrupt
|
||
|
=> delay_tsc
|
||
|
=> dummy_load_1ms_pd_init
|
||
|
=> do_one_initcall
|
||
|
=> do_init_module
|
||
|
=> __do_sys_finit_module
|
||
|
=> do_syscall_64
|
||
|
=> entry_SYSCALL_64_after_hwframe
|
||
|
|
||
|
In this case, it is possible to see that the thread added the highest
|
||
|
contribution to the *timer latency* and the stack trace, saved during
|
||
|
the timerlat IRQ handler, points to a function named
|
||
|
dummy_load_1ms_pd_init, which had the following code (on purpose)::
|
||
|
|
||
|
static int __init dummy_load_1ms_pd_init(void)
|
||
|
{
|
||
|
preempt_disable();
|
||
|
mdelay(1);
|
||
|
preempt_enable();
|
||
|
return 0;
|
||
|
|
||
|
}
|