docs: update ivshmem device spec
Add some notes on the parts needed to use ivshmem devices: more specifically, explain the purpose of an ivshmem server and the basic concept to use the ivshmem devices in guests. Move some parts of the documentation and re-organise it. Signed-off-by: David Marchand <david.marchand@6wind.com> Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
This commit is contained in:
parent
1e21feb628
commit
8c4ef202b9
@ -2,30 +2,103 @@
|
|||||||
Device Specification for Inter-VM shared memory device
|
Device Specification for Inter-VM shared memory device
|
||||||
------------------------------------------------------
|
------------------------------------------------------
|
||||||
|
|
||||||
The Inter-VM shared memory device is designed to share a region of memory to
|
The Inter-VM shared memory device is designed to share a memory region (created
|
||||||
userspace in multiple virtual guests. The memory region does not belong to any
|
on the host via the POSIX shared memory API) between multiple QEMU processes
|
||||||
guest, but is a POSIX memory object on the host. Optionally, the device may
|
running different guests. In order for all guests to be able to pick up the
|
||||||
support sending interrupts to other guests sharing the same memory region.
|
shared memory area, it is modeled by QEMU as a PCI device exposing said memory
|
||||||
|
to the guest as a PCI BAR.
|
||||||
|
The memory region does not belong to any guest, but is a POSIX memory object on
|
||||||
|
the host. The host can access this shared memory if needed.
|
||||||
|
|
||||||
|
The device also provides an optional communication mechanism between guests
|
||||||
|
sharing the same memory object. More details about that in the section 'Guest to
|
||||||
|
guest communication' section.
|
||||||
|
|
||||||
|
|
||||||
The Inter-VM PCI device
|
The Inter-VM PCI device
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
*BARs*
|
From the VM point of view, the ivshmem PCI device supports three BARs.
|
||||||
|
|
||||||
The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support
|
- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is
|
||||||
registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is
|
not used.
|
||||||
used to map the shared memory object from the host. The size of BAR2 is
|
- BAR1 is used for MSI-X when it is enabled in the device.
|
||||||
specified when the guest is started and must be a power of 2 in size.
|
- BAR2 is used to access the shared memory object.
|
||||||
|
|
||||||
*Registers*
|
It is your choice how to use the device but you must choose between two
|
||||||
|
behaviors :
|
||||||
|
|
||||||
The device currently supports 4 registers of 32-bits each. Registers
|
- basically, if you only need the shared memory part, you will map BAR2.
|
||||||
are used for synchronization between guests sharing the same memory object when
|
This way, you have access to the shared memory in guest and can use it as you
|
||||||
interrupts are supported (this requires using the shared memory server).
|
see fit (memnic, for example, uses it in userland
|
||||||
|
http://dpdk.org/browse/memnic).
|
||||||
|
|
||||||
The server assigns each VM an ID number and sends this ID number to the QEMU
|
- BAR0 and BAR1 are used to implement an optional communication mechanism
|
||||||
process when the guest starts.
|
through interrupts in the guests. If you need an event mechanism between the
|
||||||
|
guests accessing the shared memory, you will most likely want to write a
|
||||||
|
kernel driver that will handle interrupts. See details in the section 'Guest
|
||||||
|
to guest communication' section.
|
||||||
|
|
||||||
|
The behavior is chosen when starting your QEMU processes:
|
||||||
|
- no communication mechanism needed, the first QEMU to start creates the shared
|
||||||
|
memory on the host, subsequent QEMU processes will use it.
|
||||||
|
|
||||||
|
- communication mechanism needed, an ivshmem server must be started before any
|
||||||
|
QEMU processes, then each QEMU process connects to the server unix socket.
|
||||||
|
|
||||||
|
For more details on the QEMU ivshmem parameters, see qemu-doc documentation.
|
||||||
|
|
||||||
|
|
||||||
|
Guest to guest communication
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
This section details the communication mechanism between the guests accessing
|
||||||
|
the ivhsmem shared memory.
|
||||||
|
|
||||||
|
*ivshmem server*
|
||||||
|
|
||||||
|
This server code is available in qemu.git/contrib/ivshmem-server.
|
||||||
|
|
||||||
|
The server must be started on the host before any guest.
|
||||||
|
It creates a shared memory object then waits for clients to connect on a unix
|
||||||
|
socket.
|
||||||
|
|
||||||
|
For each client (QEMU process) that connects to the server:
|
||||||
|
- the server assigns an ID for this client and sends this ID to him as the first
|
||||||
|
message,
|
||||||
|
- the server sends a fd to the shared memory object to this client,
|
||||||
|
- the server creates a new set of host eventfds associated to the new client and
|
||||||
|
sends this set to all already connected clients,
|
||||||
|
- finally, the server sends all the eventfds sets for all clients to the new
|
||||||
|
client.
|
||||||
|
|
||||||
|
The server signals all clients when one of them disconnects.
|
||||||
|
|
||||||
|
The client IDs are limited to 16 bits because of the current implementation (see
|
||||||
|
Doorbell register in 'PCI device registers' subsection). Hence only 65536
|
||||||
|
clients are supported.
|
||||||
|
|
||||||
|
All the file descriptors (fd to the shared memory, eventfds for each client)
|
||||||
|
are passed to clients using SCM_RIGHTS over the server unix socket.
|
||||||
|
|
||||||
|
Apart from the current ivshmem implementation in QEMU, an ivshmem client has
|
||||||
|
been provided in qemu.git/contrib/ivshmem-client for debug.
|
||||||
|
|
||||||
|
*QEMU as an ivshmem client*
|
||||||
|
|
||||||
|
At initialisation, when creating the ivshmem device, QEMU gets its ID from the
|
||||||
|
server then makes it available through BAR0 IVPosition register for the VM to
|
||||||
|
use (see 'PCI device registers' subsection).
|
||||||
|
QEMU then uses the fd to the shared memory to map it to BAR2.
|
||||||
|
eventfds for all other clients received from the server are stored to implement
|
||||||
|
BAR0 Doorbell register (see 'PCI device registers' subsection).
|
||||||
|
Finally, eventfds assigned to this QEMU process are used to send interrupts in
|
||||||
|
this VM.
|
||||||
|
|
||||||
|
*PCI device registers*
|
||||||
|
|
||||||
|
From the VM point of view, the ivshmem PCI device supports 4 registers of
|
||||||
|
32-bits each.
|
||||||
|
|
||||||
enum ivshmem_registers {
|
enum ivshmem_registers {
|
||||||
IntrMask = 0,
|
IntrMask = 0,
|
||||||
@ -49,8 +122,8 @@ bit to 0 and unmasked by setting the first bit to 1.
|
|||||||
IVPosition Register: The IVPosition register is read-only and reports the
|
IVPosition Register: The IVPosition register is read-only and reports the
|
||||||
guest's ID number. The guest IDs are non-negative integers. When using the
|
guest's ID number. The guest IDs are non-negative integers. When using the
|
||||||
server, since the server is a separate process, the VM ID will only be set when
|
server, since the server is a separate process, the VM ID will only be set when
|
||||||
the device is ready (shared memory is received from the server and accessible via
|
the device is ready (shared memory is received from the server and accessible
|
||||||
the device). If the device is not ready, the IVPosition will return -1.
|
via the device). If the device is not ready, the IVPosition will return -1.
|
||||||
Applications should ensure that they have a valid VM ID before accessing the
|
Applications should ensure that they have a valid VM ID before accessing the
|
||||||
shared memory.
|
shared memory.
|
||||||
|
|
||||||
@ -59,8 +132,8 @@ Doorbell register. The doorbell register is 32-bits, logically divided into
|
|||||||
two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low
|
two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low
|
||||||
16-bits are the interrupt vector to trigger. The semantics of the value
|
16-bits are the interrupt vector to trigger. The semantics of the value
|
||||||
written to the doorbell depends on whether the device is using MSI or a regular
|
written to the doorbell depends on whether the device is using MSI or a regular
|
||||||
pin-based interrupt. In short, MSI uses vectors while regular interrupts set the
|
pin-based interrupt. In short, MSI uses vectors while regular interrupts set
|
||||||
status register.
|
the status register.
|
||||||
|
|
||||||
Regular Interrupts
|
Regular Interrupts
|
||||||
|
|
||||||
@ -71,7 +144,7 @@ interrupt in the destination guest.
|
|||||||
|
|
||||||
Message Signalled Interrupts
|
Message Signalled Interrupts
|
||||||
|
|
||||||
A ivshmem device may support multiple MSI vectors. If so, the lower 16-bits
|
An ivshmem device may support multiple MSI vectors. If so, the lower 16-bits
|
||||||
written to the Doorbell register must be between 0 and the maximum number of
|
written to the Doorbell register must be between 0 and the maximum number of
|
||||||
vectors the guest supports. The lower 16 bits written to the doorbell is the
|
vectors the guest supports. The lower 16 bits written to the doorbell is the
|
||||||
MSI vector that will be raised in the destination guest. The number of MSI
|
MSI vector that will be raised in the destination guest. The number of MSI
|
||||||
@ -83,14 +156,3 @@ interrupt itself should be communicated via the shared memory region. Devices
|
|||||||
supporting multiple MSI vectors can use different vectors to indicate different
|
supporting multiple MSI vectors can use different vectors to indicate different
|
||||||
events have occurred. The semantics of interrupt vectors are left to the
|
events have occurred. The semantics of interrupt vectors are left to the
|
||||||
user's discretion.
|
user's discretion.
|
||||||
|
|
||||||
|
|
||||||
Usage in the Guest
|
|
||||||
------------------
|
|
||||||
|
|
||||||
The shared memory device is intended to be used with the provided UIO driver.
|
|
||||||
Very little configuration is needed. The guest should map BAR0 to access the
|
|
||||||
registers (an array of 32-bit ints allows simple writing) and map BAR2 to
|
|
||||||
access the shared memory region itself. The size of the shared memory region
|
|
||||||
is specified when the guest (or shared memory server) is started. A guest may
|
|
||||||
map the whole shared memory region or only part of it.
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user