docs/devel/multiple-iothreads: Convert to rST format
Convert docs/devel/multiple-iothreads.txt to rST format. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20240816132212.3602106-5-peter.maydell@linaro.org
This commit is contained in:
parent
362dbb4f3f
commit
4f0b3e0b95
@ -21,3 +21,4 @@ Details about QEMU's various subsystems including how to add features to them.
|
|||||||
writing-monitor-commands
|
writing-monitor-commands
|
||||||
virtio-backends
|
virtio-backends
|
||||||
crypto
|
crypto
|
||||||
|
multiple-iothreads
|
||||||
|
139
docs/devel/multiple-iothreads.rst
Normal file
139
docs/devel/multiple-iothreads.rst
Normal file
@ -0,0 +1,139 @@
|
|||||||
|
Using Multiple ``IOThread``\ s
|
||||||
|
==============================
|
||||||
|
|
||||||
|
..
|
||||||
|
Copyright (c) 2014-2017 Red Hat Inc.
|
||||||
|
|
||||||
|
This work is licensed under the terms of the GNU GPL, version 2 or later. See
|
||||||
|
the COPYING file in the top-level directory.
|
||||||
|
|
||||||
|
|
||||||
|
This document explains the ``IOThread`` feature and how to write code that runs
|
||||||
|
outside the BQL.
|
||||||
|
|
||||||
|
The main loop and ``IOThread``\ s
|
||||||
|
---------------------------------
|
||||||
|
QEMU is an event-driven program that can do several things at once using an
|
||||||
|
event loop. The VNC server and the QMP monitor are both processed from the
|
||||||
|
same event loop, which monitors their file descriptors until they become
|
||||||
|
readable and then invokes a callback.
|
||||||
|
|
||||||
|
The default event loop is called the main loop (see ``main-loop.c``). It is
|
||||||
|
possible to create additional event loop threads using
|
||||||
|
``-object iothread,id=my-iothread``.
|
||||||
|
|
||||||
|
Side note: The main loop and ``IOThread`` are both event loops but their code is
|
||||||
|
not shared completely. Sometimes it is useful to remember that although they
|
||||||
|
are conceptually similar they are currently not interchangeable.
|
||||||
|
|
||||||
|
Why ``IOThread``\ s are useful
|
||||||
|
------------------------------
|
||||||
|
``IOThread``\ s allow the user to control the placement of work. The main loop is a
|
||||||
|
scalability bottleneck on hosts with many CPUs. Work can be spread across
|
||||||
|
several ``IOThread``\ s instead of just one main loop. When set up correctly this
|
||||||
|
can improve I/O latency and reduce jitter seen by the guest.
|
||||||
|
|
||||||
|
The main loop is also deeply associated with the BQL, which is a
|
||||||
|
scalability bottleneck in itself. vCPU threads and the main loop use the BQL
|
||||||
|
to serialize execution of QEMU code. This mutex is necessary because a lot of
|
||||||
|
QEMU's code historically was not thread-safe.
|
||||||
|
|
||||||
|
The fact that all I/O processing is done in a single main loop and that the
|
||||||
|
BQL is contended by all vCPU threads and the main loop explain
|
||||||
|
why it is desirable to place work into ``IOThread``\ s.
|
||||||
|
|
||||||
|
The experimental ``virtio-blk`` data-plane implementation has been benchmarked and
|
||||||
|
shows these effects:
|
||||||
|
ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
|
||||||
|
|
||||||
|
.. _how-to-program:
|
||||||
|
|
||||||
|
How to program for ``IOThread``\ s
|
||||||
|
----------------------------------
|
||||||
|
The main difference between legacy code and new code that can run in an
|
||||||
|
``IOThread`` is dealing explicitly with the event loop object, ``AioContext``
|
||||||
|
(see ``include/block/aio.h``). Code that only works in the main loop
|
||||||
|
implicitly uses the main loop's ``AioContext``. Code that supports running
|
||||||
|
in ``IOThread``\ s must be aware of its ``AioContext``.
|
||||||
|
|
||||||
|
AioContext supports the following services:
|
||||||
|
* File descriptor monitoring (read/write/error on POSIX hosts)
|
||||||
|
* Event notifiers (inter-thread signalling)
|
||||||
|
* Timers
|
||||||
|
* Bottom Halves (BH) deferred callbacks
|
||||||
|
|
||||||
|
There are several old APIs that use the main loop AioContext:
|
||||||
|
* LEGACY ``qemu_aio_set_fd_handler()`` - monitor a file descriptor
|
||||||
|
* LEGACY ``qemu_aio_set_event_notifier()`` - monitor an event notifier
|
||||||
|
* LEGACY ``timer_new_ms()`` - create a timer
|
||||||
|
* LEGACY ``qemu_bh_new()`` - create a BH
|
||||||
|
* LEGACY ``qemu_bh_new_guarded()`` - create a BH with a device re-entrancy guard
|
||||||
|
* LEGACY ``qemu_aio_wait()`` - run an event loop iteration
|
||||||
|
|
||||||
|
Since they implicitly work on the main loop they cannot be used in code that
|
||||||
|
runs in an ``IOThread``. They might cause a crash or deadlock if called from an
|
||||||
|
``IOThread`` since the BQL is not held.
|
||||||
|
|
||||||
|
Instead, use the ``AioContext`` functions directly (see ``include/block/aio.h``):
|
||||||
|
* ``aio_set_fd_handler()`` - monitor a file descriptor
|
||||||
|
* ``aio_set_event_notifier()`` - monitor an event notifier
|
||||||
|
* ``aio_timer_new()`` - create a timer
|
||||||
|
* ``aio_bh_new()`` - create a BH
|
||||||
|
* ``aio_bh_new_guarded()`` - create a BH with a device re-entrancy guard
|
||||||
|
* ``aio_poll()`` - run an event loop iteration
|
||||||
|
|
||||||
|
The ``qemu_bh_new_guarded``/``aio_bh_new_guarded`` APIs accept a
|
||||||
|
``MemReentrancyGuard``
|
||||||
|
argument, which is used to check for and prevent re-entrancy problems. For
|
||||||
|
BHs associated with devices, the reentrancy-guard is contained in the
|
||||||
|
corresponding ``DeviceState`` and named ``mem_reentrancy_guard``.
|
||||||
|
|
||||||
|
The ``AioContext`` can be obtained from the ``IOThread`` using
|
||||||
|
``iothread_get_aio_context()`` or for the main loop using
|
||||||
|
``qemu_get_aio_context()``. Code that takes an ``AioContext`` argument
|
||||||
|
works both in ``IOThread``\ s or the main loop, depending on which ``AioContext``
|
||||||
|
instance the caller passes in.
|
||||||
|
|
||||||
|
How to synchronize with an ``IOThread``
|
||||||
|
---------------------------------------
|
||||||
|
Variables that can be accessed by multiple threads require some form of
|
||||||
|
synchronization such as ``qemu_mutex_lock()``, ``rcu_read_lock()``, etc.
|
||||||
|
|
||||||
|
``AioContext`` functions like ``aio_set_fd_handler()``,
|
||||||
|
``aio_set_event_notifier()``, ``aio_bh_new()``, and ``aio_timer_new()``
|
||||||
|
are thread-safe. They can be used to trigger activity in an ``IOThread``.
|
||||||
|
|
||||||
|
Side note: the best way to schedule a function call across threads is to call
|
||||||
|
``aio_bh_schedule_oneshot()``.
|
||||||
|
|
||||||
|
The main loop thread can wait synchronously for a condition using
|
||||||
|
``AIO_WAIT_WHILE()``.
|
||||||
|
|
||||||
|
``AioContext`` and the block layer
|
||||||
|
----------------------------------
|
||||||
|
The ``AioContext`` originates from the QEMU block layer, even though nowadays
|
||||||
|
``AioContext`` is a generic event loop that can be used by any QEMU subsystem.
|
||||||
|
|
||||||
|
The block layer has support for ``AioContext`` integrated. Each
|
||||||
|
``BlockDriverState`` is associated with an ``AioContext`` using
|
||||||
|
``bdrv_try_change_aio_context()`` and ``bdrv_get_aio_context()``.
|
||||||
|
This allows block layer code to process I/O inside the
|
||||||
|
right ``AioContext``. Other subsystems may wish to follow a similar approach.
|
||||||
|
|
||||||
|
Block layer code must therefore expect to run in an ``IOThread`` and avoid using
|
||||||
|
old APIs that implicitly use the main loop. See
|
||||||
|
`How to program for IOThreads`_ for information on how to do that.
|
||||||
|
|
||||||
|
Code running in the monitor typically needs to ensure that past
|
||||||
|
requests from the guest are completed. When a block device is running
|
||||||
|
in an ``IOThread``, the ``IOThread`` can also process requests from the guest
|
||||||
|
(via ioeventfd). To achieve both objects, wrap the code between
|
||||||
|
``bdrv_drained_begin()`` and ``bdrv_drained_end()``, thus creating a "drained
|
||||||
|
section".
|
||||||
|
|
||||||
|
Long-running jobs (usually in the form of coroutines) are often scheduled in
|
||||||
|
the ``BlockDriverState``'s ``AioContext``. The functions
|
||||||
|
``bdrv_add``/``remove_aio_context_notifier``, or alternatively
|
||||||
|
``blk_add``/``remove_aio_context_notifier`` if you use ``BlockBackends``,
|
||||||
|
can be used to get a notification whenever ``bdrv_try_change_aio_context()``
|
||||||
|
moves a ``BlockDriverState`` to a different ``AioContext``.
|
@ -1,130 +0,0 @@
|
|||||||
Copyright (c) 2014-2017 Red Hat Inc.
|
|
||||||
|
|
||||||
This work is licensed under the terms of the GNU GPL, version 2 or later. See
|
|
||||||
the COPYING file in the top-level directory.
|
|
||||||
|
|
||||||
|
|
||||||
This document explains the IOThread feature and how to write code that runs
|
|
||||||
outside the BQL.
|
|
||||||
|
|
||||||
The main loop and IOThreads
|
|
||||||
---------------------------
|
|
||||||
QEMU is an event-driven program that can do several things at once using an
|
|
||||||
event loop. The VNC server and the QMP monitor are both processed from the
|
|
||||||
same event loop, which monitors their file descriptors until they become
|
|
||||||
readable and then invokes a callback.
|
|
||||||
|
|
||||||
The default event loop is called the main loop (see main-loop.c). It is
|
|
||||||
possible to create additional event loop threads using -object
|
|
||||||
iothread,id=my-iothread.
|
|
||||||
|
|
||||||
Side note: The main loop and IOThread are both event loops but their code is
|
|
||||||
not shared completely. Sometimes it is useful to remember that although they
|
|
||||||
are conceptually similar they are currently not interchangeable.
|
|
||||||
|
|
||||||
Why IOThreads are useful
|
|
||||||
------------------------
|
|
||||||
IOThreads allow the user to control the placement of work. The main loop is a
|
|
||||||
scalability bottleneck on hosts with many CPUs. Work can be spread across
|
|
||||||
several IOThreads instead of just one main loop. When set up correctly this
|
|
||||||
can improve I/O latency and reduce jitter seen by the guest.
|
|
||||||
|
|
||||||
The main loop is also deeply associated with the BQL, which is a
|
|
||||||
scalability bottleneck in itself. vCPU threads and the main loop use the BQL
|
|
||||||
to serialize execution of QEMU code. This mutex is necessary because a lot of
|
|
||||||
QEMU's code historically was not thread-safe.
|
|
||||||
|
|
||||||
The fact that all I/O processing is done in a single main loop and that the
|
|
||||||
BQL is contended by all vCPU threads and the main loop explain
|
|
||||||
why it is desirable to place work into IOThreads.
|
|
||||||
|
|
||||||
The experimental virtio-blk data-plane implementation has been benchmarked and
|
|
||||||
shows these effects:
|
|
||||||
ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
|
|
||||||
|
|
||||||
How to program for IOThreads
|
|
||||||
----------------------------
|
|
||||||
The main difference between legacy code and new code that can run in an
|
|
||||||
IOThread is dealing explicitly with the event loop object, AioContext
|
|
||||||
(see include/block/aio.h). Code that only works in the main loop
|
|
||||||
implicitly uses the main loop's AioContext. Code that supports running
|
|
||||||
in IOThreads must be aware of its AioContext.
|
|
||||||
|
|
||||||
AioContext supports the following services:
|
|
||||||
* File descriptor monitoring (read/write/error on POSIX hosts)
|
|
||||||
* Event notifiers (inter-thread signalling)
|
|
||||||
* Timers
|
|
||||||
* Bottom Halves (BH) deferred callbacks
|
|
||||||
|
|
||||||
There are several old APIs that use the main loop AioContext:
|
|
||||||
* LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
|
|
||||||
* LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
|
|
||||||
* LEGACY timer_new_ms() - create a timer
|
|
||||||
* LEGACY qemu_bh_new() - create a BH
|
|
||||||
* LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
|
|
||||||
* LEGACY qemu_aio_wait() - run an event loop iteration
|
|
||||||
|
|
||||||
Since they implicitly work on the main loop they cannot be used in code that
|
|
||||||
runs in an IOThread. They might cause a crash or deadlock if called from an
|
|
||||||
IOThread since the BQL is not held.
|
|
||||||
|
|
||||||
Instead, use the AioContext functions directly (see include/block/aio.h):
|
|
||||||
* aio_set_fd_handler() - monitor a file descriptor
|
|
||||||
* aio_set_event_notifier() - monitor an event notifier
|
|
||||||
* aio_timer_new() - create a timer
|
|
||||||
* aio_bh_new() - create a BH
|
|
||||||
* aio_bh_new_guarded() - create a BH with a device re-entrancy guard
|
|
||||||
* aio_poll() - run an event loop iteration
|
|
||||||
|
|
||||||
The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
|
|
||||||
argument, which is used to check for and prevent re-entrancy problems. For
|
|
||||||
BHs associated with devices, the reentrancy-guard is contained in the
|
|
||||||
corresponding DeviceState and named "mem_reentrancy_guard".
|
|
||||||
|
|
||||||
The AioContext can be obtained from the IOThread using
|
|
||||||
iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
|
|
||||||
Code that takes an AioContext argument works both in IOThreads or the main
|
|
||||||
loop, depending on which AioContext instance the caller passes in.
|
|
||||||
|
|
||||||
How to synchronize with an IOThread
|
|
||||||
-----------------------------------
|
|
||||||
Variables that can be accessed by multiple threads require some form of
|
|
||||||
synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
|
|
||||||
|
|
||||||
AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
|
|
||||||
aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
|
|
||||||
activity in an IOThread.
|
|
||||||
|
|
||||||
Side note: the best way to schedule a function call across threads is to call
|
|
||||||
aio_bh_schedule_oneshot().
|
|
||||||
|
|
||||||
The main loop thread can wait synchronously for a condition using
|
|
||||||
AIO_WAIT_WHILE().
|
|
||||||
|
|
||||||
AioContext and the block layer
|
|
||||||
------------------------------
|
|
||||||
The AioContext originates from the QEMU block layer, even though nowadays
|
|
||||||
AioContext is a generic event loop that can be used by any QEMU subsystem.
|
|
||||||
|
|
||||||
The block layer has support for AioContext integrated. Each BlockDriverState
|
|
||||||
is associated with an AioContext using bdrv_try_change_aio_context() and
|
|
||||||
bdrv_get_aio_context(). This allows block layer code to process I/O inside the
|
|
||||||
right AioContext. Other subsystems may wish to follow a similar approach.
|
|
||||||
|
|
||||||
Block layer code must therefore expect to run in an IOThread and avoid using
|
|
||||||
old APIs that implicitly use the main loop. See the "How to program for
|
|
||||||
IOThreads" above for information on how to do that.
|
|
||||||
|
|
||||||
Code running in the monitor typically needs to ensure that past
|
|
||||||
requests from the guest are completed. When a block device is running
|
|
||||||
in an IOThread, the IOThread can also process requests from the guest
|
|
||||||
(via ioeventfd). To achieve both objects, wrap the code between
|
|
||||||
bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
|
|
||||||
section".
|
|
||||||
|
|
||||||
Long-running jobs (usually in the form of coroutines) are often scheduled in
|
|
||||||
the BlockDriverState's AioContext. The functions
|
|
||||||
bdrv_add/remove_aio_context_notifier, or alternatively
|
|
||||||
blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
|
|
||||||
get a notification whenever bdrv_try_change_aio_context() moves a
|
|
||||||
BlockDriverState to a different AioContext.
|
|
Loading…
x
Reference in New Issue
Block a user