145 lines
5.5 KiB
ReStructuredText
145 lines
5.5 KiB
ReStructuredText
|
===========================================================================
|
||
|
Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe
|
||
|
===========================================================================
|
||
|
|
||
|
:Author: Robert Love <rml@tech9.net>
|
||
|
|
||
|
|
||
|
Introduction
|
||
|
============
|
||
|
|
||
|
|
||
|
A preemptible kernel creates new locking issues. The issues are the same as
|
||
|
those under SMP: concurrency and reentrancy. Thankfully, the Linux preemptible
|
||
|
kernel model leverages existing SMP locking mechanisms. Thus, the kernel
|
||
|
requires explicit additional locking for very few additional situations.
|
||
|
|
||
|
This document is for all kernel hackers. Developing code in the kernel
|
||
|
requires protecting these situations.
|
||
|
|
||
|
|
||
|
RULE #1: Per-CPU data structures need explicit protection
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
|
||
|
Two similar problems arise. An example code snippet::
|
||
|
|
||
|
struct this_needs_locking tux[NR_CPUS];
|
||
|
tux[smp_processor_id()] = some_value;
|
||
|
/* task is preempted here... */
|
||
|
something = tux[smp_processor_id()];
|
||
|
|
||
|
First, since the data is per-CPU, it may not have explicit SMP locking, but
|
||
|
require it otherwise. Second, when a preempted task is finally rescheduled,
|
||
|
the previous value of smp_processor_id may not equal the current. You must
|
||
|
protect these situations by disabling preemption around them.
|
||
|
|
||
|
You can also use put_cpu() and get_cpu(), which will disable preemption.
|
||
|
|
||
|
|
||
|
RULE #2: CPU state must be protected.
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
|
||
|
Under preemption, the state of the CPU must be protected. This is arch-
|
||
|
dependent, but includes CPU structures and state not preserved over a context
|
||
|
switch. For example, on x86, entering and exiting FPU mode is now a critical
|
||
|
section that must occur while preemption is disabled. Think what would happen
|
||
|
if the kernel is executing a floating-point instruction and is then preempted.
|
||
|
Remember, the kernel does not save FPU state except for user tasks. Therefore,
|
||
|
upon preemption, the FPU registers will be sold to the lowest bidder. Thus,
|
||
|
preemption must be disabled around such regions.
|
||
|
|
||
|
Note, some FPU functions are already explicitly preempt safe. For example,
|
||
|
kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
|
||
|
|
||
|
|
||
|
RULE #3: Lock acquire and release must be performed by same task
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
|
||
|
A lock acquired in one task must be released by the same task. This
|
||
|
means you can't do oddball things like acquire a lock and go off to
|
||
|
play while another task releases it. If you want to do something
|
||
|
like this, acquire and release the task in the same code path and
|
||
|
have the caller wait on an event by the other task.
|
||
|
|
||
|
|
||
|
Solution
|
||
|
========
|
||
|
|
||
|
|
||
|
Data protection under preemption is achieved by disabling preemption for the
|
||
|
duration of the critical region.
|
||
|
|
||
|
::
|
||
|
|
||
|
preempt_enable() decrement the preempt counter
|
||
|
preempt_disable() increment the preempt counter
|
||
|
preempt_enable_no_resched() decrement, but do not immediately preempt
|
||
|
preempt_check_resched() if needed, reschedule
|
||
|
preempt_count() return the preempt counter
|
||
|
|
||
|
The functions are nestable. In other words, you can call preempt_disable
|
||
|
n-times in a code path, and preemption will not be reenabled until the n-th
|
||
|
call to preempt_enable. The preempt statements define to nothing if
|
||
|
preemption is not enabled.
|
||
|
|
||
|
Note that you do not need to explicitly prevent preemption if you are holding
|
||
|
any locks or interrupts are disabled, since preemption is implicitly disabled
|
||
|
in those cases.
|
||
|
|
||
|
But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
|
||
|
disabling preemption - any cond_resched() or cond_resched_lock() might trigger
|
||
|
a reschedule if the preempt count is 0. A simple printk() might trigger a
|
||
|
reschedule. So use this implicit preemption-disabling property only if you
|
||
|
know that the affected codepath does not do any of this. Best policy is to use
|
||
|
this only for small, atomic code that you wrote and which calls no complex
|
||
|
functions.
|
||
|
|
||
|
Example::
|
||
|
|
||
|
cpucache_t *cc; /* this is per-CPU */
|
||
|
preempt_disable();
|
||
|
cc = cc_data(searchp);
|
||
|
if (cc && cc->avail) {
|
||
|
__free_block(searchp, cc_entry(cc), cc->avail);
|
||
|
cc->avail = 0;
|
||
|
}
|
||
|
preempt_enable();
|
||
|
return 0;
|
||
|
|
||
|
Notice how the preemption statements must encompass every reference of the
|
||
|
critical variables. Another example::
|
||
|
|
||
|
int buf[NR_CPUS];
|
||
|
set_cpu_val(buf);
|
||
|
if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");
|
||
|
spin_lock(&buf_lock);
|
||
|
/* ... */
|
||
|
|
||
|
This code is not preempt-safe, but see how easily we can fix it by simply
|
||
|
moving the spin_lock up two lines.
|
||
|
|
||
|
|
||
|
Preventing preemption using interrupt disabling
|
||
|
===============================================
|
||
|
|
||
|
|
||
|
It is possible to prevent a preemption event using local_irq_disable and
|
||
|
local_irq_save. Note, when doing so, you must be very careful to not cause
|
||
|
an event that would set need_resched and result in a preemption check. When
|
||
|
in doubt, rely on locking or explicit preemption disabling.
|
||
|
|
||
|
Note in 2.5 interrupt disabling is now only per-CPU (e.g. local).
|
||
|
|
||
|
An additional concern is proper usage of local_irq_disable and local_irq_save.
|
||
|
These may be used to protect from preemption, however, on exit, if preemption
|
||
|
may be enabled, a test to see if preemption is required should be done. If
|
||
|
these are called from the spin_lock and read/write lock macros, the right thing
|
||
|
is done. They may also be called within a spin-lock protected region, however,
|
||
|
if they are ever called outside of this context, a test for preemption should
|
||
|
be made. Do note that calls from interrupt context or bottom half/ tasklets
|
||
|
are also protected by preemption locks and so may use the versions which do
|
||
|
not check preemption.
|