 3e866365e1
			
		
	
	
		3e866365e1
		
	
	
	
	
		
			
			A new vhost user message is added to allow QEMU to ask to vhost user backend to broadcast a fake RARP after live migration for guest without GUEST_ANNOUNCE capability. This new message is sent only if the backend supports the new VHOST_USER_PROTOCOL_F_RARP protocol feature. The payload of this new message is the MAC address of the guest (not known by the backend). The MAC address is copied in the first 6 bytes of a u64 to avoid to create a new payload message type. This new message has no equivalent ioctl so a new callback is added in the userOps structure to send the request. Upon reception of this new message the vhost user backend must generate and broadcast a fake RARP request to notify the migration is terminated. Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com> [Rebased and fixed checkpatch errors - Marc-André] Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
		
			
				
	
	
		
			399 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			399 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Vhost-user Protocol
 | |
| ===================
 | |
| 
 | |
| Copyright (c) 2014 Virtual Open Systems Sarl.
 | |
| 
 | |
| This work is licensed under the terms of the GNU GPL, version 2 or later.
 | |
| See the COPYING file in the top-level directory.
 | |
| ===================
 | |
| 
 | |
| This protocol is aiming to complement the ioctl interface used to control the
 | |
| vhost implementation in the Linux kernel. It implements the control plane needed
 | |
| to establish virtqueue sharing with a user space process on the same host. It
 | |
| uses communication over a Unix domain socket to share file descriptors in the
 | |
| ancillary data of the message.
 | |
| 
 | |
| The protocol defines 2 sides of the communication, master and slave. Master is
 | |
| the application that shares its virtqueues, in our case QEMU. Slave is the
 | |
| consumer of the virtqueues.
 | |
| 
 | |
| In the current implementation QEMU is the Master, and the Slave is intended to
 | |
| be a software Ethernet switch running in user space, such as Snabbswitch.
 | |
| 
 | |
| Master and slave can be either a client (i.e. connecting) or server (listening)
 | |
| in the socket communication.
 | |
| 
 | |
| Message Specification
 | |
| ---------------------
 | |
| 
 | |
| Note that all numbers are in the machine native byte order. A vhost-user message
 | |
| consists of 3 header fields and a payload:
 | |
| 
 | |
| ------------------------------------
 | |
| | request | flags | size | payload |
 | |
| ------------------------------------
 | |
| 
 | |
|  * Request: 32-bit type of the request
 | |
|  * Flags: 32-bit bit field:
 | |
|    - Lower 2 bits are the version (currently 0x01)
 | |
|    - Bit 2 is the reply flag - needs to be sent on each reply from the slave
 | |
|  * Size - 32-bit size of the payload
 | |
| 
 | |
| 
 | |
| Depending on the request type, payload can be:
 | |
| 
 | |
|  * A single 64-bit integer
 | |
|    -------
 | |
|    | u64 |
 | |
|    -------
 | |
| 
 | |
|    u64: a 64-bit unsigned integer
 | |
| 
 | |
|  * A vring state description
 | |
|    ---------------
 | |
|   | index | num |
 | |
|   ---------------
 | |
| 
 | |
|    Index: a 32-bit index
 | |
|    Num: a 32-bit number
 | |
| 
 | |
|  * A vring address description
 | |
|    --------------------------------------------------------------
 | |
|    | index | flags | size | descriptor | used | available | log |
 | |
|    --------------------------------------------------------------
 | |
| 
 | |
|    Index: a 32-bit vring index
 | |
|    Flags: a 32-bit vring flags
 | |
|    Descriptor: a 64-bit user address of the vring descriptor table
 | |
|    Used: a 64-bit user address of the vring used ring
 | |
|    Available: a 64-bit user address of the vring available ring
 | |
|    Log: a 64-bit guest address for logging
 | |
| 
 | |
|  * Memory regions description
 | |
|    ---------------------------------------------------
 | |
|    | num regions | padding | region0 | ... | region7 |
 | |
|    ---------------------------------------------------
 | |
| 
 | |
|    Num regions: a 32-bit number of regions
 | |
|    Padding: 32-bit
 | |
| 
 | |
|    A region is:
 | |
|    -----------------------------------------------------
 | |
|    | guest address | size | user address | mmap offset |
 | |
|    -----------------------------------------------------
 | |
| 
 | |
|    Guest address: a 64-bit guest address of the region
 | |
|    Size: a 64-bit size
 | |
|    User address: a 64-bit user address
 | |
|    mmap offset: 64-bit offset where region starts in the mapped memory
 | |
| 
 | |
| In QEMU the vhost-user message is implemented with the following struct:
 | |
| 
 | |
| typedef struct VhostUserMsg {
 | |
|     VhostUserRequest request;
 | |
|     uint32_t flags;
 | |
|     uint32_t size;
 | |
|     union {
 | |
|         uint64_t u64;
 | |
|         struct vhost_vring_state state;
 | |
|         struct vhost_vring_addr addr;
 | |
|         VhostUserMemory memory;
 | |
|     };
 | |
| } QEMU_PACKED VhostUserMsg;
 | |
| 
 | |
| Communication
 | |
| -------------
 | |
| 
 | |
| The protocol for vhost-user is based on the existing implementation of vhost
 | |
| for the Linux Kernel. Most messages that can be sent via the Unix domain socket
 | |
| implementing vhost-user have an equivalent ioctl to the kernel implementation.
 | |
| 
 | |
| The communication consists of master sending message requests and slave sending
 | |
| message replies. Most of the requests don't require replies. Here is a list of
 | |
| the ones that do:
 | |
| 
 | |
|  * VHOST_GET_FEATURES
 | |
|  * VHOST_GET_PROTOCOL_FEATURES
 | |
|  * VHOST_GET_VRING_BASE
 | |
|  * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
 | |
| 
 | |
| There are several messages that the master sends with file descriptors passed
 | |
| in the ancillary data:
 | |
| 
 | |
|  * VHOST_SET_MEM_TABLE
 | |
|  * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
 | |
|  * VHOST_SET_LOG_FD
 | |
|  * VHOST_SET_VRING_KICK
 | |
|  * VHOST_SET_VRING_CALL
 | |
|  * VHOST_SET_VRING_ERR
 | |
| 
 | |
| If Master is unable to send the full message or receives a wrong reply it will
 | |
| close the connection. An optional reconnection mechanism can be implemented.
 | |
| 
 | |
| Any protocol extensions are gated by protocol feature bits,
 | |
| which allows full backwards compatibility on both master
 | |
| and slave.
 | |
| As older slaves don't support negotiating protocol features,
 | |
| a feature bit was dedicated for this purpose:
 | |
| #define VHOST_USER_F_PROTOCOL_FEATURES 30
 | |
| 
 | |
| Multiple queue support
 | |
| ----------------------
 | |
| 
 | |
| Multiple queue is treated as a protocol extension, hence the slave has to
 | |
| implement protocol features first. The multiple queues feature is supported
 | |
| only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set.
 | |
| 
 | |
| The max number of queues the slave supports can be queried with message
 | |
| VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of
 | |
| requested queues is bigger than that.
 | |
| 
 | |
| As all queues share one connection, the master uses a unique index for each
 | |
| queue in the sent message to identify a specified queue. One queue pair
 | |
| is enabled initially. More queues are enabled dynamically, by sending
 | |
| message VHOST_USER_SET_VRING_ENABLE.
 | |
| 
 | |
| Migration
 | |
| ---------
 | |
| 
 | |
| During live migration, the master may need to track the modifications
 | |
| the slave makes to the memory mapped regions. The client should mark
 | |
| the dirty pages in a log. Once it complies to this logging, it may
 | |
| declare the VHOST_F_LOG_ALL vhost feature.
 | |
| 
 | |
| All the modifications to memory pointed by vring "descriptor" should
 | |
| be marked. Modifications to "used" vring should be marked if
 | |
| VHOST_VRING_F_LOG is part of ring's features.
 | |
| 
 | |
| Dirty pages are of size:
 | |
| #define VHOST_LOG_PAGE 0x1000
 | |
| 
 | |
| The log memory fd is provided in the ancillary data of
 | |
| VHOST_USER_SET_LOG_BASE message when the slave has
 | |
| VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature.
 | |
| 
 | |
| The size of the log may be computed by using all the known guest
 | |
| addresses. The log covers from address 0 to the maximum of guest
 | |
| regions. In pseudo-code, to mark page at "addr" as dirty:
 | |
| 
 | |
| page = addr / VHOST_LOG_PAGE
 | |
| log[page / 8] |= 1 << page % 8
 | |
| 
 | |
| Use atomic operations, as the log may be concurrently manipulated.
 | |
| 
 | |
| VHOST_USER_SET_LOG_FD is an optional message with an eventfd in
 | |
| ancillary data, it may be used to inform the master that the log has
 | |
| been modified.
 | |
| 
 | |
| Once the source has finished migration, VHOST_USER_RESET_OWNER message
 | |
| will be sent by the source. No further update must be done before the
 | |
| destination takes over with new regions & rings.
 | |
| 
 | |
| Protocol features
 | |
| -----------------
 | |
| 
 | |
| #define VHOST_USER_PROTOCOL_F_MQ             0
 | |
| #define VHOST_USER_PROTOCOL_F_LOG_SHMFD      1
 | |
| #define VHOST_USER_PROTOCOL_F_RARP           2
 | |
| 
 | |
| Message types
 | |
| -------------
 | |
| 
 | |
|  * VHOST_USER_GET_FEATURES
 | |
| 
 | |
|       Id: 1
 | |
|       Equivalent ioctl: VHOST_GET_FEATURES
 | |
|       Master payload: N/A
 | |
|       Slave payload: u64
 | |
| 
 | |
|       Get from the underlying vhost implementation the features bitmask.
 | |
|       Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
 | |
|       VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
 | |
| 
 | |
|  * VHOST_USER_SET_FEATURES
 | |
| 
 | |
|       Id: 2
 | |
|       Ioctl: VHOST_SET_FEATURES
 | |
|       Master payload: u64
 | |
| 
 | |
|       Enable features in the underlying vhost implementation using a bitmask.
 | |
|       Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
 | |
|       VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
 | |
| 
 | |
|  * VHOST_USER_GET_PROTOCOL_FEATURES
 | |
| 
 | |
|       Id: 15
 | |
|       Equivalent ioctl: VHOST_GET_FEATURES
 | |
|       Master payload: N/A
 | |
|       Slave payload: u64
 | |
| 
 | |
|       Get the protocol feature bitmask from the underlying vhost implementation.
 | |
|       Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
 | |
|       VHOST_USER_GET_FEATURES.
 | |
|       Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support
 | |
|       this message even before VHOST_USER_SET_FEATURES was called.
 | |
| 
 | |
|  * VHOST_USER_SET_PROTOCOL_FEATURES
 | |
| 
 | |
|       Id: 16
 | |
|       Ioctl: VHOST_SET_FEATURES
 | |
|       Master payload: u64
 | |
| 
 | |
|       Enable protocol features in the underlying vhost implementation.
 | |
|       Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
 | |
|       VHOST_USER_GET_FEATURES.
 | |
|       Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support
 | |
|       this message even before VHOST_USER_SET_FEATURES was called.
 | |
| 
 | |
|  * VHOST_USER_SET_OWNER
 | |
| 
 | |
|       Id: 3
 | |
|       Equivalent ioctl: VHOST_SET_OWNER
 | |
|       Master payload: N/A
 | |
| 
 | |
|       Issued when a new connection is established. It sets the current Master
 | |
|       as an owner of the session. This can be used on the Slave as a
 | |
|       "session start" flag.
 | |
| 
 | |
|  * VHOST_USER_RESET_DEVICE
 | |
| 
 | |
|       Id: 4
 | |
|       Equivalent ioctl: VHOST_RESET_DEVICE
 | |
|       Master payload: N/A
 | |
| 
 | |
|       Issued when a new connection is about to be closed. The Master will no
 | |
|       longer own this connection (and will usually close it).
 | |
| 
 | |
|  * VHOST_USER_SET_MEM_TABLE
 | |
| 
 | |
|       Id: 5
 | |
|       Equivalent ioctl: VHOST_SET_MEM_TABLE
 | |
|       Master payload: memory regions description
 | |
| 
 | |
|       Sets the memory map regions on the slave so it can translate the vring
 | |
|       addresses. In the ancillary data there is an array of file descriptors
 | |
|       for each memory mapped region. The size and ordering of the fds matches
 | |
|       the number and ordering of memory regions.
 | |
| 
 | |
|  * VHOST_USER_SET_LOG_BASE
 | |
| 
 | |
|       Id: 6
 | |
|       Equivalent ioctl: VHOST_SET_LOG_BASE
 | |
|       Master payload: u64
 | |
|       Slave payload: N/A
 | |
| 
 | |
|       Sets the logging base address.
 | |
| 
 | |
|  * VHOST_USER_SET_LOG_FD
 | |
| 
 | |
|       Id: 7
 | |
|       Equivalent ioctl: VHOST_SET_LOG_FD
 | |
|       Master payload: N/A
 | |
| 
 | |
|       Sets the logging file descriptor, which is passed as ancillary data.
 | |
| 
 | |
|  * VHOST_USER_SET_VRING_NUM
 | |
| 
 | |
|       Id: 8
 | |
|       Equivalent ioctl: VHOST_SET_VRING_NUM
 | |
|       Master payload: vring state description
 | |
| 
 | |
|       Sets the number of vrings for this owner.
 | |
| 
 | |
|  * VHOST_USER_SET_VRING_ADDR
 | |
| 
 | |
|       Id: 9
 | |
|       Equivalent ioctl: VHOST_SET_VRING_ADDR
 | |
|       Master payload: vring address description
 | |
|       Slave payload: N/A
 | |
| 
 | |
|       Sets the addresses of the different aspects of the vring.
 | |
| 
 | |
|  * VHOST_USER_SET_VRING_BASE
 | |
| 
 | |
|       Id: 10
 | |
|       Equivalent ioctl: VHOST_SET_VRING_BASE
 | |
|       Master payload: vring state description
 | |
| 
 | |
|       Sets the base offset in the available vring.
 | |
| 
 | |
|  * VHOST_USER_GET_VRING_BASE
 | |
| 
 | |
|       Id: 11
 | |
|       Equivalent ioctl: VHOST_USER_GET_VRING_BASE
 | |
|       Master payload: vring state description
 | |
|       Slave payload: vring state description
 | |
| 
 | |
|       Get the available vring base offset.
 | |
| 
 | |
|  * VHOST_USER_SET_VRING_KICK
 | |
| 
 | |
|       Id: 12
 | |
|       Equivalent ioctl: VHOST_SET_VRING_KICK
 | |
|       Master payload: u64
 | |
| 
 | |
|       Set the event file descriptor for adding buffers to the vring. It
 | |
|       is passed in the ancillary data.
 | |
|       Bits (0-7) of the payload contain the vring index. Bit 8 is the
 | |
|       invalid FD flag. This flag is set when there is no file descriptor
 | |
|       in the ancillary data. This signals that polling should be used
 | |
|       instead of waiting for a kick.
 | |
| 
 | |
|  * VHOST_USER_SET_VRING_CALL
 | |
| 
 | |
|       Id: 13
 | |
|       Equivalent ioctl: VHOST_SET_VRING_CALL
 | |
|       Master payload: u64
 | |
| 
 | |
|       Set the event file descriptor to signal when buffers are used. It
 | |
|       is passed in the ancillary data.
 | |
|       Bits (0-7) of the payload contain the vring index. Bit 8 is the
 | |
|       invalid FD flag. This flag is set when there is no file descriptor
 | |
|       in the ancillary data. This signals that polling will be used
 | |
|       instead of waiting for the call.
 | |
| 
 | |
|  * VHOST_USER_SET_VRING_ERR
 | |
| 
 | |
|       Id: 14
 | |
|       Equivalent ioctl: VHOST_SET_VRING_ERR
 | |
|       Master payload: u64
 | |
| 
 | |
|       Set the event file descriptor to signal when error occurs. It
 | |
|       is passed in the ancillary data.
 | |
|       Bits (0-7) of the payload contain the vring index. Bit 8 is the
 | |
|       invalid FD flag. This flag is set when there is no file descriptor
 | |
|       in the ancillary data.
 | |
| 
 | |
|  * VHOST_USER_GET_QUEUE_NUM
 | |
| 
 | |
|       Id: 17
 | |
|       Equivalent ioctl: N/A
 | |
|       Master payload: N/A
 | |
|       Slave payload: u64
 | |
| 
 | |
|       Query how many queues the backend supports. This request should be
 | |
|       sent only when VHOST_USER_PROTOCOL_F_MQ is set in quried protocol
 | |
|       features by VHOST_USER_GET_PROTOCOL_FEATURES.
 | |
| 
 | |
|  * VHOST_USER_SET_VRING_ENABLE
 | |
| 
 | |
|       Id: 18
 | |
|       Equivalent ioctl: N/A
 | |
|       Master payload: vring state description
 | |
| 
 | |
|       Signal slave to enable or disable corresponding vring.
 | |
| 
 | |
|  * VHOST_USER_SEND_RARP
 | |
| 
 | |
|       Id: 19
 | |
|       Equivalent ioctl: N/A
 | |
|       Master payload: u64
 | |
| 
 | |
|       Ask vhost user backend to broadcast a fake RARP to notify the migration
 | |
|       is terminated for guest that does not support GUEST_ANNOUNCE.
 | |
|       Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
 | |
|       VHOST_USER_GET_FEATURES and protocol feature bit VHOST_USER_PROTOCOL_F_RARP
 | |
|       is present in VHOST_USER_GET_PROTOCOL_FEATURES.
 | |
|       The first 6 bytes of the payload contain the mac address of the guest to
 | |
|       allow the vhost user backend to construct and broadcast the fake RARP.
 |