 27af7d6ea5
			
		
	
	
		27af7d6ea5
		
	
	
	
	
		
			
			Avoid hot pages being replaced by others to remarkably decrease cache
misses
Sample results with the test program which quote from xbzrle.txt ran in
vm:(migrate bandwidth:1GE and xbzrle cache size 8MB)
the test program:
include <stdlib.h>
include <stdio.h>
int main()
 {
        char *buf = (char *) calloc(4096, 4096);
        while (1) {
            int i;
            for (i = 0; i < 4096 * 4; i++) {
                buf[i * 4096 / 4]++;
            }
            printf(".");
        }
 }
before this patch:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":1020,"xbzrle-cache":{"bytes":1108284,
"cache-size":8388608,"cache-miss-rate":0.987013,"pages":18297,"overflow":8,
"cache-miss":1228737},"status":"active","setup-time":10,"total-time":52398,
"ram":{"total":12466991104,"remaining":1695744,"mbps":935.559472,
"transferred":5780760580,"dirty-sync-counter":271,"duplicate":2878530,
"dirty-pages-rate":29130,"skipped":0,"normal-bytes":5748592640,
"normal":1403465}},"id":"libvirt-706"}
18k pages sent compressed in 52 seconds.
cache-miss-rate is 98.7%, totally miss.
after optimizing:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":2054,"xbzrle-cache":{"bytes":5066763,
"cache-size":8388608,"cache-miss-rate":0.485924,"pages":194823,"overflow":0,
"cache-miss":210653},"status":"active","setup-time":11,"total-time":18729,
"ram":{"total":12466991104,"remaining":3895296,"mbps":937.663549,
"transferred":1615042219,"dirty-sync-counter":98,"duplicate":2869840,
"dirty-pages-rate":58781,"skipped":0,"normal-bytes":1588404224,
"normal":387794}},"id":"libvirt-266"}
194k pages sent compressed in 18 seconds.
The value of cache-miss-rate decrease to 48.59%.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
		
	
			
		
			
				
	
	
		
			137 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			137 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| XBZRLE (Xor Based Zero Run Length Encoding)
 | |
| ===========================================
 | |
| 
 | |
| Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
 | |
| of VM downtime and the total live-migration time of Virtual machines.
 | |
| It is particularly useful for virtual machines running memory write intensive
 | |
| workloads that are typical of large enterprise applications such as SAP ERP
 | |
| Systems, and generally speaking for any application that uses a sparse memory
 | |
| update pattern.
 | |
| 
 | |
| Instead of sending the changed guest memory page this solution will send a
 | |
| compressed version of the updates, thus reducing the amount of data sent during
 | |
| live migration.
 | |
| In order to be able to calculate the update, the previous memory pages need to
 | |
| be stored on the source. Those pages are stored in a dedicated cache
 | |
| (hash table) and are accessed by their address.
 | |
| The larger the cache size the better the chances are that the page has already
 | |
| been stored in the cache.
 | |
| A small cache size will result in high cache miss rate.
 | |
| Cache size can be changed before and during migration.
 | |
| 
 | |
| Format
 | |
| =======
 | |
| 
 | |
| The compression format performs a XOR between the previous and current content
 | |
| of the page, where zero represents an unchanged value.
 | |
| The page data delta is represented by zero and non zero runs.
 | |
| A zero run is represented by its length (in bytes).
 | |
| A non zero run is represented by its length (in bytes) and the new data.
 | |
| The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
 | |
| 
 | |
| There can be more than one valid encoding, the sender may send a longer encoding
 | |
| for the benefit of reducing computation cost.
 | |
| 
 | |
| page = zrun nzrun
 | |
|        | zrun nzrun page
 | |
| 
 | |
| zrun = length
 | |
| 
 | |
| nzrun = length byte...
 | |
| 
 | |
| length = uleb128 encoded integer
 | |
| 
 | |
| On the sender side XBZRLE is used as a compact delta encoding of page updates,
 | |
| retrieving the old page content from the cache (default size of 512 MB). The
 | |
| receiving side uses the existing page's content and XBZRLE to decode the new
 | |
| page's content.
 | |
| 
 | |
| This work was originally based on research results published
 | |
| VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
 | |
| Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
 | |
| Additionally the delta encoder XBRLE was improved further using the XBZRLE
 | |
| instead.
 | |
| 
 | |
| XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
 | |
| ideal for in-line, real-time encoding such as is needed for live-migration.
 | |
| 
 | |
| Example
 | |
| old buffer:
 | |
| 1001 zeros
 | |
| 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d
 | |
| 3074 zeros
 | |
| 
 | |
| new buffer:
 | |
| 1001 zeros
 | |
| 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69
 | |
| 3074 zeros
 | |
| 
 | |
| encoded buffer:
 | |
| 
 | |
| encoded length 24
 | |
| e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69
 | |
| 
 | |
| Cache update strategy
 | |
| =====================
 | |
| Keeping the hot pages in the cache is effective for decreased cache
 | |
| misses. XBZRLE uses a counter as the age of each page. The counter will
 | |
| increase after each ram dirty bitmap sync. When a cache conflict is
 | |
| detected, XBZRLE will only evict pages in the cache that are older than
 | |
| a threshold.
 | |
| 
 | |
| Usage
 | |
| ======================
 | |
| 1. Verify the destination QEMU version is able to decode the new format.
 | |
|     {qemu} info migrate_capabilities
 | |
|     {qemu} xbzrle: off , ...
 | |
| 
 | |
| 2. Activate xbzrle on both source and destination:
 | |
|    {qemu} migrate_set_capability xbzrle on
 | |
| 
 | |
| 3. Set the XBZRLE cache size - the cache size is in MBytes and should be a
 | |
| power of 2. The cache default value is 64MBytes. (on source only)
 | |
|     {qemu} migrate_set_cache_size 256m
 | |
| 
 | |
| 4. Start outgoing migration
 | |
|     {qemu} migrate -d tcp:destination.host:4444
 | |
|     {qemu} info migrate
 | |
|     capabilities: xbzrle: on
 | |
|     Migration status: active
 | |
|     transferred ram: A kbytes
 | |
|     remaining ram: B kbytes
 | |
|     total ram: C kbytes
 | |
|     total time: D milliseconds
 | |
|     duplicate: E pages
 | |
|     normal: F pages
 | |
|     normal bytes: G kbytes
 | |
|     cache size: H bytes
 | |
|     xbzrle transferred: I kbytes
 | |
|     xbzrle pages: J pages
 | |
|     xbzrle cache miss: K
 | |
|     xbzrle overflow : L
 | |
| 
 | |
| xbzrle cache-miss: the number of cache misses to date - high cache-miss rate
 | |
| indicates that the cache size is set too low.
 | |
| xbzrle overflow: the number of overflows in the decoding which where the delta
 | |
| could not be compressed. This can happen if the changes in the pages are too
 | |
| large or there are many short changes; for example, changing every second byte
 | |
| (half a page).
 | |
| 
 | |
| Testing: Testing indicated that live migration with XBZRLE was completed in 110
 | |
| seconds, whereas without it would not be able to complete.
 | |
| 
 | |
| A simple synthetic memory r/w load generator:
 | |
| ..    include <stdlib.h>
 | |
| ..    include <stdio.h>
 | |
| ..    int main()
 | |
| ..    {
 | |
| ..        char *buf = (char *) calloc(4096, 4096);
 | |
| ..        while (1) {
 | |
| ..            int i;
 | |
| ..            for (i = 0; i < 4096 * 4; i++) {
 | |
| ..                buf[i * 4096 / 4]++;
 | |
| ..            }
 | |
| ..            printf(".");
 | |
| ..        }
 | |
| ..    }
 |