From 8e28b65f52637917d3d02ade4e0ae85442de9f86 Mon Sep 17 00:00:00 2001 From: "Denis V. Lunev" Date: Thu, 7 Apr 2022 11:39:32 +0300 Subject: [PATCH 01/13] qemu-img: properly list formats which have consistency check implemented Simple grep for the .bdrv_co_check callback presence gives the following list of block drivers * QED * VDI * VHDX * VMDK * Parallels which have this callback. The presense of the callback means that consistency check is supported. The patch updates documentation accordingly. Signed-off-by: Denis V. Lunev CC: Kevin Wolf CC: Hanna Reitz Message-Id: <20220407083932.531965-1-den@openvz.org> Reviewed-by: Eric Blake Signed-off-by: Kevin Wolf --- docs/tools/qemu-img.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst index 8885ea11cf..85a6e05b35 100644 --- a/docs/tools/qemu-img.rst +++ b/docs/tools/qemu-img.rst @@ -332,8 +332,8 @@ Command description: ``-r all`` fixes all kinds of errors, with a higher risk of choosing the wrong fix or hiding corruption that has already occurred. - Only the formats ``qcow2``, ``qed`` and ``vdi`` support - consistency checks. + Only the formats ``qcow2``, ``qed``, ``parallels``, ``vhdx``, ``vmdk`` and + ``vdi`` support consistency checks. In case the image does not have any inconsistencies, check exits with ``0``. Other exit codes indicate the kind of inconsistency found or if another error From 31009d13cc505b9569971fbbe50266c1727ba5db Mon Sep 17 00:00:00 2001 From: Kevin Wolf Date: Thu, 7 Apr 2022 15:36:55 +0200 Subject: [PATCH 02/13] docs/vhost-user: Clarifications for VHOST_USER_ADD/REM_MEM_REG The specification for VHOST_USER_ADD/REM_MEM_REG messages is unclear in several points, which has led to clients having incompatible implementations. This changes the specification to be more explicit about them: * VHOST_USER_ADD_MEM_REG is not specified as receiving a file descriptor, though it obviously does need to do so. All implementations agree on this one, fix the specification. * VHOST_USER_REM_MEM_REG is not specified as receiving a file descriptor either, and it also has no reason to do so. rust-vmm does not send file descriptors for removing a memory region (in agreement with the specification), libvhost-user and QEMU do (which is a bug), though libvhost-user doesn't actually make any use of it. Change the specification so that for compatibility QEMU's behaviour becomes legal, even if discouraged, but rust-vmm's behaviour becomes the explicitly recommended mode of operation. * VHOST_USER_ADD_MEM_REG doesn't have a documented return value, which is the desired behaviour in the non-postcopy case. It also implemented like this in QEMU and rust-vmm, though libvhost-user is buggy and sometimes sends an unexpected reply. This will be fixed in a separate patch. However, in postcopy mode it does reply like VHOST_USER_SET_MEM_TABLE. This behaviour is shared between libvhost-user and QEMU; rust-vmm doesn't implement postcopy mode yet. Mention it explicitly in the spec. * The specification doesn't mention how VHOST_USER_REM_MEM_REG identifies the memory region to be removed. Change it to describe the existing behaviour of libvhost-user (guest address, user address and size must match). Signed-off-by: Kevin Wolf Message-Id: <20220407133657.155281-2-kwolf@redhat.com> Reviewed-by: Raphael Norwitz Reviewed-by: Stefan Hajnoczi Signed-off-by: Kevin Wolf --- docs/interop/vhost-user.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst index 4dbc84fd00..f9e721ba5f 100644 --- a/docs/interop/vhost-user.rst +++ b/docs/interop/vhost-user.rst @@ -308,6 +308,7 @@ replies. Here is a list of the ones that do: There are several messages that the master sends with file descriptors passed in the ancillary data: +* ``VHOST_USER_ADD_MEM_REG`` * ``VHOST_USER_SET_MEM_TABLE`` * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) * ``VHOST_USER_SET_LOG_FD`` @@ -1334,6 +1335,14 @@ Master message types ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and update the memory tables of the slave device. + Exactly one file descriptor from which the memory is mapped is + passed in the ancillary data. + + In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the slave + replies with the bases of the memory mapped region to the master. + For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``. + They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly. + ``VHOST_USER_REM_MEM_REG`` :id: 38 :equivalent ioctl: N/A @@ -1349,6 +1358,14 @@ Master message types ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and update the memory tables of the slave device. + The memory region to be removed is identified by its guest address, + user address and size. The mmap offset is ignored. + + No file descriptors SHOULD be passed in the ancillary data. For + compatibility with existing incorrect implementations, the slave MAY + accept messages with one file descriptor. If a file descriptor is + passed, the slave MUST close it without using it otherwise. + ``VHOST_USER_SET_STATUS`` :id: 39 :equivalent ioctl: VHOST_VDPA_SET_STATUS From 5ebfdeb21223e0406a711decbb2baceb5ec8c416 Mon Sep 17 00:00:00 2001 From: Kevin Wolf Date: Thu, 7 Apr 2022 15:36:56 +0200 Subject: [PATCH 03/13] libvhost-user: Fix extra vu_add/rem_mem_reg reply Outside of postcopy mode, neither VHOST_USER_ADD_MEM_REG nor VHOST_USER_REM_MEM_REG are supposed to send a reply unless explicitly requested with the need_reply flag. Their current implementation always sends a reply, even if it isn't requested. This confuses the master because it will interpret the reply as a reply for the next message for which it actually expects a reply. need_reply is already handled correctly by vu_dispatch(), so just don't send a reply in the non-postcopy part of the message handler for these two commands. Signed-off-by: Kevin Wolf Message-Id: <20220407133657.155281-3-kwolf@redhat.com> Reviewed-by: Raphael Norwitz Reviewed-by: Stefan Hajnoczi Signed-off-by: Kevin Wolf --- subprojects/libvhost-user/libvhost-user.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c index 47d2efc60f..eccaff5168 100644 --- a/subprojects/libvhost-user/libvhost-user.c +++ b/subprojects/libvhost-user/libvhost-user.c @@ -800,8 +800,7 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) { DPRINT("Successfully added new region\n"); dev->nregions++; - vmsg_set_reply_u64(vmsg, 0); - return true; + return false; } } @@ -874,15 +873,13 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) { } } - if (found) { - vmsg_set_reply_u64(vmsg, 0); - } else { + if (!found) { vu_panic(dev, "Specified region not found\n"); } close(vmsg->fds[0]); - return true; + return false; } static bool From a81d8d4a7203e4be392f9ad37af104759b7cbda5 Mon Sep 17 00:00:00 2001 From: Kevin Wolf Date: Thu, 7 Apr 2022 15:36:57 +0200 Subject: [PATCH 04/13] vhost-user: Don't pass file descriptor for VHOST_USER_REM_MEM_REG The spec clarifies now that QEMU should not send a file descriptor in a request to remove a memory region. Change it accordingly. For libvhost-user, this is a bug fix that makes it compatible with rust-vmm's implementation that doesn't send a file descriptor. Keep accepting, but ignoring a file descriptor for compatibility with older QEMU versions. Signed-off-by: Kevin Wolf Message-Id: <20220407133657.155281-4-kwolf@redhat.com> Reviewed-by: Raphael Norwitz Reviewed-by: Stefan Hajnoczi Signed-off-by: Kevin Wolf --- hw/virtio/vhost-user.c | 2 +- subprojects/libvhost-user/libvhost-user.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index a80315ecfc..2d434ff0bc 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -751,7 +751,7 @@ static int send_remove_regions(struct vhost_dev *dev, vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0); msg->payload.mem_reg.region = region_buffer; - ret = vhost_user_write(dev, msg, &fd, 1); + ret = vhost_user_write(dev, msg, NULL, 0); if (ret < 0) { return ret; } diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c index eccaff5168..d0041c864b 100644 --- a/subprojects/libvhost-user/libvhost-user.c +++ b/subprojects/libvhost-user/libvhost-user.c @@ -822,15 +822,15 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) { int i; bool found = false; - if (vmsg->fd_num != 1) { + if (vmsg->fd_num > 1) { vmsg_close_fds(vmsg); - vu_panic(dev, "VHOST_USER_REM_MEM_REG received %d fds - only 1 fd " + vu_panic(dev, "VHOST_USER_REM_MEM_REG received %d fds - at most 1 fd " "should be sent for this message type", vmsg->fd_num); return false; } if (vmsg->size < VHOST_USER_MEM_REG_SIZE) { - close(vmsg->fds[0]); + vmsg_close_fds(vmsg); vu_panic(dev, "VHOST_USER_REM_MEM_REG requires a message size of at " "least %d bytes and only %d bytes were received", VHOST_USER_MEM_REG_SIZE, vmsg->size); @@ -877,7 +877,7 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) { vu_panic(dev, "Specified region not found\n"); } - close(vmsg->fds[0]); + vmsg_close_fds(vmsg); return false; } From 15aee7ac9587562b1d021959dedc2bbb7e9a9fb3 Mon Sep 17 00:00:00 2001 From: Hanna Reitz Date: Wed, 27 Apr 2022 13:40:54 +0200 Subject: [PATCH 05/13] block: Classify bdrv_get_flags() as I/O function This function is safe to call in an I/O context, and qcow2_do_open() does so (invoked in an I/O context by qcow2_co_invalidate_cache()). Signed-off-by: Hanna Reitz Message-Id: <20220427114057.36651-2-hreitz@redhat.com> Reviewed-by: Eric Blake Signed-off-by: Kevin Wolf --- block.c | 2 +- include/block/block-global-state.h | 1 - include/block/block-io.h | 1 + 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index 8cd16e757e..2c00dddd80 100644 --- a/block.c +++ b/block.c @@ -6298,7 +6298,7 @@ const char *bdrv_get_device_or_node_name(const BlockDriverState *bs) int bdrv_get_flags(BlockDriverState *bs) { - GLOBAL_STATE_CODE(); + IO_CODE(); return bs->open_flags; } diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h index 25bb69bbef..21265e3966 100644 --- a/include/block/block-global-state.h +++ b/include/block/block-global-state.h @@ -172,7 +172,6 @@ void bdrv_next_cleanup(BdrvNextIterator *it); BlockDriverState *bdrv_next_monitor_owned(BlockDriverState *bs); void bdrv_iterate_format(void (*it)(void *opaque, const char *name), void *opaque, bool read_only); -int bdrv_get_flags(BlockDriverState *bs); char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp); char *bdrv_dirname(BlockDriverState *bs, Error **errp); diff --git a/include/block/block-io.h b/include/block/block-io.h index 5e3f346806..62c84f0519 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -103,6 +103,7 @@ int bdrv_apply_auto_read_only(BlockDriverState *bs, const char *errmsg, bool bdrv_is_read_only(BlockDriverState *bs); bool bdrv_is_writable(BlockDriverState *bs); bool bdrv_is_sg(BlockDriverState *bs); +int bdrv_get_flags(BlockDriverState *bs); bool bdrv_is_inserted(BlockDriverState *bs); void bdrv_lock_medium(BlockDriverState *bs, bool locked); void bdrv_eject(BlockDriverState *bs, bool eject_flag); From 06e9cd19a4e9acf8d24327361a999943d846f607 Mon Sep 17 00:00:00 2001 From: Hanna Reitz Date: Wed, 27 Apr 2022 13:40:55 +0200 Subject: [PATCH 06/13] qcow2: Do not reopen data_file in invalidate_cache qcow2_co_invalidate_cache() closes and opens the qcow2 file, by calling qcow2_close() and qcow2_do_open(). These two functions must thus be usable from both a global-state and an I/O context. As they are, they are not safe to call in an I/O context, because they use bdrv_unref_child() and bdrv_open_child() to close/open the data_file child, respectively, both of which are global-state functions. When used from qcow2_co_invalidate_cache(), we do not need to close/open the data_file child, though (we do not do this for bs->file or bs->backing either), and so we should skip it in the qcow2_co_invalidate_cache() path. To do so, add a parameter to qcow2_do_open() and qcow2_close() to make them skip handling s->data_file, and have qcow2_co_invalidate_cache() exempt it from the memset() on the BDRVQcow2State. (Note that the QED driver similarly closes/opens the QED image by invoking bdrv_qed_close()+bdrv_qed_do_open(), but both functions seem safe to use in an I/O context.) Fixes: https://gitlab.com/qemu-project/qemu/-/issues/945 Signed-off-by: Hanna Reitz Message-Id: <20220427114057.36651-3-hreitz@redhat.com> Reviewed-by: Eric Blake Signed-off-by: Kevin Wolf --- block/qcow2.c | 104 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 62 insertions(+), 42 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index b5c47931ef..4f5e6440fb 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1296,7 +1296,8 @@ static int validate_compression_type(BDRVQcow2State *s, Error **errp) /* Called with s->lock held. */ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, - int flags, Error **errp) + int flags, bool open_data_file, + Error **errp) { ERRP_GUARD(); BDRVQcow2State *s = bs->opaque; @@ -1614,50 +1615,52 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, goto fail; } - /* Open external data file */ - s->data_file = bdrv_open_child(NULL, options, "data-file", bs, - &child_of_bds, BDRV_CHILD_DATA, - true, errp); - if (*errp) { - ret = -EINVAL; - goto fail; - } + if (open_data_file) { + /* Open external data file */ + s->data_file = bdrv_open_child(NULL, options, "data-file", bs, + &child_of_bds, BDRV_CHILD_DATA, + true, errp); + if (*errp) { + ret = -EINVAL; + goto fail; + } - if (s->incompatible_features & QCOW2_INCOMPAT_DATA_FILE) { - if (!s->data_file && s->image_data_file) { - s->data_file = bdrv_open_child(s->image_data_file, options, - "data-file", bs, &child_of_bds, - BDRV_CHILD_DATA, false, errp); + if (s->incompatible_features & QCOW2_INCOMPAT_DATA_FILE) { + if (!s->data_file && s->image_data_file) { + s->data_file = bdrv_open_child(s->image_data_file, options, + "data-file", bs, &child_of_bds, + BDRV_CHILD_DATA, false, errp); + if (!s->data_file) { + ret = -EINVAL; + goto fail; + } + } if (!s->data_file) { + error_setg(errp, "'data-file' is required for this image"); ret = -EINVAL; goto fail; } - } - if (!s->data_file) { - error_setg(errp, "'data-file' is required for this image"); - ret = -EINVAL; - goto fail; - } - /* No data here */ - bs->file->role &= ~BDRV_CHILD_DATA; + /* No data here */ + bs->file->role &= ~BDRV_CHILD_DATA; - /* Must succeed because we have given up permissions if anything */ - bdrv_child_refresh_perms(bs, bs->file, &error_abort); - } else { - if (s->data_file) { - error_setg(errp, "'data-file' can only be set for images with an " - "external data file"); - ret = -EINVAL; - goto fail; - } + /* Must succeed because we have given up permissions if anything */ + bdrv_child_refresh_perms(bs, bs->file, &error_abort); + } else { + if (s->data_file) { + error_setg(errp, "'data-file' can only be set for images with " + "an external data file"); + ret = -EINVAL; + goto fail; + } - s->data_file = bs->file; + s->data_file = bs->file; - if (data_file_is_raw(bs)) { - error_setg(errp, "data-file-raw requires a data file"); - ret = -EINVAL; - goto fail; + if (data_file_is_raw(bs)) { + error_setg(errp, "data-file-raw requires a data file"); + ret = -EINVAL; + goto fail; + } } } @@ -1839,7 +1842,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, fail: g_free(s->image_data_file); - if (has_data_file(bs)) { + if (open_data_file && has_data_file(bs)) { bdrv_unref_child(bs, s->data_file); s->data_file = NULL; } @@ -1876,7 +1879,8 @@ static void coroutine_fn qcow2_open_entry(void *opaque) BDRVQcow2State *s = qoc->bs->opaque; qemu_co_mutex_lock(&s->lock); - qoc->ret = qcow2_do_open(qoc->bs, qoc->options, qoc->flags, qoc->errp); + qoc->ret = qcow2_do_open(qoc->bs, qoc->options, qoc->flags, true, + qoc->errp); qemu_co_mutex_unlock(&s->lock); } @@ -2714,7 +2718,7 @@ static int qcow2_inactivate(BlockDriverState *bs) return result; } -static void qcow2_close(BlockDriverState *bs) +static void qcow2_do_close(BlockDriverState *bs, bool close_data_file) { BDRVQcow2State *s = bs->opaque; qemu_vfree(s->l1_table); @@ -2740,7 +2744,7 @@ static void qcow2_close(BlockDriverState *bs) g_free(s->image_backing_file); g_free(s->image_backing_format); - if (has_data_file(bs)) { + if (close_data_file && has_data_file(bs)) { bdrv_unref_child(bs, s->data_file); s->data_file = NULL; } @@ -2749,11 +2753,17 @@ static void qcow2_close(BlockDriverState *bs) qcow2_free_snapshots(bs); } +static void qcow2_close(BlockDriverState *bs) +{ + qcow2_do_close(bs, true); +} + static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs, Error **errp) { ERRP_GUARD(); BDRVQcow2State *s = bs->opaque; + BdrvChild *data_file; int flags = s->flags; QCryptoBlock *crypto = NULL; QDict *options; @@ -2767,14 +2777,24 @@ static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs, crypto = s->crypto; s->crypto = NULL; - qcow2_close(bs); + /* + * Do not reopen s->data_file (i.e., have qcow2_do_close() not close it, + * and then prevent qcow2_do_open() from opening it), because this function + * runs in the I/O path and as such we must not invoke global-state + * functions like bdrv_unref_child() and bdrv_open_child(). + */ + qcow2_do_close(bs, false); + + data_file = s->data_file; memset(s, 0, sizeof(BDRVQcow2State)); + s->data_file = data_file; + options = qdict_clone_shallow(bs->options); flags &= ~BDRV_O_INACTIVE; qemu_co_mutex_lock(&s->lock); - ret = qcow2_do_open(bs, options, flags, errp); + ret = qcow2_do_open(bs, options, flags, false, errp); qemu_co_mutex_unlock(&s->lock); qobject_unref(options); if (ret < 0) { From ccfaf783c8eecae99b671e4e892954c9226ef5dc Mon Sep 17 00:00:00 2001 From: Hanna Reitz Date: Wed, 27 Apr 2022 13:40:56 +0200 Subject: [PATCH 07/13] Revert "main-loop: Disable GLOBAL_STATE_CODE() assertions" This reverts commit b1c073490553f80594b903ceedfc7c1aef6b1b19. (We wanted to do so once the 7.1 tree opens, which has happened. The issue reported in https://gitlab.com/qemu-project/qemu/-/issues/945 should be fixed by the preceding patches.) Signed-off-by: Hanna Reitz Message-Id: <20220427114057.36651-4-hreitz@redhat.com> Reviewed-by: Eric Blake Signed-off-by: Kevin Wolf --- include/qemu/main-loop.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h index d3750c8e76..89bd9edefb 100644 --- a/include/qemu/main-loop.h +++ b/include/qemu/main-loop.h @@ -284,8 +284,7 @@ bool qemu_in_main_thread(void); #else #define GLOBAL_STATE_CODE() \ do { \ - /* FIXME: Re-enable after 7.0 release */ \ - /* assert(qemu_in_main_thread()); */ \ + assert(qemu_in_main_thread()); \ } while (0) #endif /* CONFIG_COCOA */ From 96edda8f142928c627f93b3ad74c64edf86cc22f Mon Sep 17 00:00:00 2001 From: Hanna Reitz Date: Wed, 27 Apr 2022 13:40:57 +0200 Subject: [PATCH 08/13] iotests: Add regression test for issue 945 Create a VM with a BDS in an iothread, add -incoming defer to the command line, and then export this BDS via NBD. Doing so should not fail an assertion. Signed-off-by: Hanna Reitz Message-Id: <20220427114057.36651-5-hreitz@redhat.com> Reviewed-by: Eric Blake Tested-by: Eric Blake Signed-off-by: Kevin Wolf --- .../tests/export-incoming-iothread | 81 +++++++++++++++++++ .../tests/export-incoming-iothread.out | 5 ++ 2 files changed, 86 insertions(+) create mode 100755 tests/qemu-iotests/tests/export-incoming-iothread create mode 100644 tests/qemu-iotests/tests/export-incoming-iothread.out diff --git a/tests/qemu-iotests/tests/export-incoming-iothread b/tests/qemu-iotests/tests/export-incoming-iothread new file mode 100755 index 0000000000..7679e49103 --- /dev/null +++ b/tests/qemu-iotests/tests/export-incoming-iothread @@ -0,0 +1,81 @@ +#!/usr/bin/env python3 +# group: rw quick migration +# +# Regression test for issue 945: +# https://gitlab.com/qemu-project/qemu/-/issues/945 +# Test adding an export on top of an iothread-ed block device while in +# -incoming defer. +# +# Copyright (C) 2022 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# + +import os +import iotests +from iotests import qemu_img_create + + +image_size = 1 * 1024 * 1024 +test_img = os.path.join(iotests.test_dir, 'test.img') +node_name = 'node0' +iothread_id = 'iothr0' + +nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock') + + +class TestExportIncomingIothread(iotests.QMPTestCase): + def setUp(self) -> None: + qemu_img_create('-f', iotests.imgfmt, test_img, str(image_size)) + + self.vm = iotests.VM() + self.vm.add_object(f'iothread,id={iothread_id}') + self.vm.add_blockdev(( + f'driver={iotests.imgfmt}', + f'node-name={node_name}', + 'file.driver=file', + f'file.filename={test_img}' + )) + self.vm.add_incoming('defer') + self.vm.launch() + + def tearDown(self): + self.vm.shutdown() + os.remove(test_img) + + def test_export_add(self): + result = self.vm.qmp('nbd-server-start', { + 'addr': { + 'type': 'unix', + 'data': { + 'path': nbd_sock + } + } + }) + self.assert_qmp(result, 'return', {}) + + # Regression test for issue 945: This should not fail an assertion + result = self.vm.qmp('block-export-add', { + 'type': 'nbd', + 'id': 'exp0', + 'node-name': node_name, + 'iothread': iothread_id + }) + self.assert_qmp(result, 'return', {}) + + +if __name__ == '__main__': + iotests.main(supported_fmts=['generic'], + unsupported_fmts=['luks'], # Would need a secret + supported_protocols=['file']) diff --git a/tests/qemu-iotests/tests/export-incoming-iothread.out b/tests/qemu-iotests/tests/export-incoming-iothread.out new file mode 100644 index 0000000000..ae1213e6f8 --- /dev/null +++ b/tests/qemu-iotests/tests/export-incoming-iothread.out @@ -0,0 +1,5 @@ +. +---------------------------------------------------------------------- +Ran 1 tests + +OK From 6d17e2879854d7d0e623c06a9286085e97bf3545 Mon Sep 17 00:00:00 2001 From: Hanna Reitz Date: Mon, 14 Mar 2022 17:27:18 +0100 Subject: [PATCH 09/13] block/vmdk: Fix reopening bs->file VMDK disk data is stored in extents, which may or may not be separate from bs->file. VmdkExtent.file points to where they are stored. Each that is stored in bs->file will simply reuse the exact pointer value of bs->file. (That is why vmdk_free_extents() will unref VmdkExtent.file (e->file) only if e->file != bs->file.) Reopen operations can change bs->file (they will replace the whole BdrvChild object, not just the BDS stored in that BdrvChild), and then we will need to change all .file pointers of all such VmdkExtents to point to the new BdrvChild. In vmdk_reopen_prepare(), we have to check which VmdkExtents are affected, and in vmdk_reopen_commit(), we can modify them. We have to split this because: - The new BdrvChild is created only after prepare, so we can change VmdkExtent.file only in commit - In commit, there no longer is any (valid) reference to the old BdrvChild object, so there would be nothing to compare VmdkExtent.file against to see whether it was equal to bs->file before reopening (There is BDRVReopenState.old_file_bs, but the old bs->file BdrvChild's .bs pointer will be NULL-ed when the new BdrvChild is created, and so we cannot compare VmdkExtent.file->bs against BDRVReopenState.old_file_bs) Signed-off-by: Hanna Reitz Message-Id: <20220314162719.65384-2-hreitz@redhat.com> Signed-off-by: Kevin Wolf --- block/vmdk.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/block/vmdk.c b/block/vmdk.c index 37c0946066..38e5ab3806 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -178,6 +178,10 @@ typedef struct BDRVVmdkState { char *create_type; } BDRVVmdkState; +typedef struct BDRVVmdkReopenState { + bool *extents_using_bs_file; +} BDRVVmdkReopenState; + typedef struct VmdkMetaData { unsigned int l1_index; unsigned int l2_index; @@ -400,15 +404,63 @@ static int vmdk_is_cid_valid(BlockDriverState *bs) return 1; } -/* We have nothing to do for VMDK reopen, stubs just return success */ static int vmdk_reopen_prepare(BDRVReopenState *state, BlockReopenQueue *queue, Error **errp) { + BDRVVmdkState *s; + BDRVVmdkReopenState *rs; + int i; + assert(state != NULL); assert(state->bs != NULL); + assert(state->opaque == NULL); + + s = state->bs->opaque; + + rs = g_new0(BDRVVmdkReopenState, 1); + state->opaque = rs; + + /* + * Check whether there are any extents stored in bs->file; if bs->file + * changes, we will need to update their .file pointers to follow suit + */ + rs->extents_using_bs_file = g_new(bool, s->num_extents); + for (i = 0; i < s->num_extents; i++) { + rs->extents_using_bs_file[i] = s->extents[i].file == state->bs->file; + } + return 0; } +static void vmdk_reopen_clean(BDRVReopenState *state) +{ + BDRVVmdkReopenState *rs = state->opaque; + + g_free(rs->extents_using_bs_file); + g_free(rs); + state->opaque = NULL; +} + +static void vmdk_reopen_commit(BDRVReopenState *state) +{ + BDRVVmdkState *s = state->bs->opaque; + BDRVVmdkReopenState *rs = state->opaque; + int i; + + for (i = 0; i < s->num_extents; i++) { + if (rs->extents_using_bs_file[i]) { + s->extents[i].file = state->bs->file; + } + } + + vmdk_reopen_clean(state); +} + +static void vmdk_reopen_abort(BDRVReopenState *state) +{ + vmdk_reopen_clean(state); +} + static int vmdk_parent_open(BlockDriverState *bs) { char *p_name; @@ -3072,6 +3124,8 @@ static BlockDriver bdrv_vmdk = { .bdrv_open = vmdk_open, .bdrv_co_check = vmdk_co_check, .bdrv_reopen_prepare = vmdk_reopen_prepare, + .bdrv_reopen_commit = vmdk_reopen_commit, + .bdrv_reopen_abort = vmdk_reopen_abort, .bdrv_child_perm = bdrv_default_perms, .bdrv_co_preadv = vmdk_co_preadv, .bdrv_co_pwritev = vmdk_co_pwritev, From ecf3200703df77f9f5ac95e3cc8e1379e40cb36a Mon Sep 17 00:00:00 2001 From: Hanna Reitz Date: Mon, 14 Mar 2022 17:27:19 +0100 Subject: [PATCH 10/13] iotests/reopen-file: Test reopening file child This should work for all format drivers that support reopening, so test it. (This serves as a regression test for HEAD^: This test used to fail for VMDK before HEAD^.) Signed-off-by: Hanna Reitz Message-Id: <20220314162719.65384-3-hreitz@redhat.com> Signed-off-by: Kevin Wolf --- tests/qemu-iotests/tests/reopen-file | 89 ++++++++++++++++++++++++ tests/qemu-iotests/tests/reopen-file.out | 5 ++ 2 files changed, 94 insertions(+) create mode 100755 tests/qemu-iotests/tests/reopen-file create mode 100644 tests/qemu-iotests/tests/reopen-file.out diff --git a/tests/qemu-iotests/tests/reopen-file b/tests/qemu-iotests/tests/reopen-file new file mode 100755 index 0000000000..8590a94d53 --- /dev/null +++ b/tests/qemu-iotests/tests/reopen-file @@ -0,0 +1,89 @@ +#!/usr/bin/env python3 +# group: rw quick +# +# Test reopening a format driver's file child +# +# Copyright (C) 2022 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# + +import os +import iotests +from iotests import imgfmt, qemu_img_create, QMPTestCase + + +image_size = 1 * 1024 * 1024 +test_img = os.path.join(iotests.test_dir, 'test.img') + + +class TestReopenFile(QMPTestCase): + def setUp(self) -> None: + res = qemu_img_create('-f', imgfmt, test_img, str(image_size)) + assert res.returncode == 0 + + # Add format driver node ('format') on top of the file ('file'), then + # add another raw node ('raw') on top of 'file' so for the reopen we + # can just switch from 'file' to 'raw' + self.vm = iotests.VM() + self.vm.add_blockdev(self.vm.qmp_to_opts({ + 'driver': imgfmt, + 'node-name': 'format', + 'file': { + 'driver': 'file', + 'node-name': 'file', + 'filename': test_img + } + })) + self.vm.add_blockdev(self.vm.qmp_to_opts({ + 'driver': 'raw', + 'node-name': 'raw', + 'file': 'file' + })) + self.vm.launch() + + def tearDown(self) -> None: + self.vm.shutdown() + os.remove(test_img) + + # Check if there was any qemu-io run that failed + if 'Pattern verification failed' in self.vm.get_log(): + print('ERROR: Pattern verification failed:') + print(self.vm.get_log()) + self.fail('qemu-io pattern verification failed') + + def test_reopen_file(self) -> None: + result = self.vm.qmp('blockdev-reopen', options=[{ + 'driver': imgfmt, + 'node-name': 'format', + 'file': 'raw' + }]) + self.assert_qmp(result, 'return', {}) + + # Do some I/O to the image to see whether it still works + # (Pattern verification will be checked by tearDown()) + result = self.vm.qmp('human-monitor-command', + command_line='qemu-io format "write -P 42 0 64k"') + self.assert_qmp(result, 'return', '') + + result = self.vm.qmp('human-monitor-command', + command_line='qemu-io format "read -P 42 0 64k"') + self.assert_qmp(result, 'return', '') + + +if __name__ == '__main__': + # Must support creating images and reopen + iotests.main(supported_fmts=['qcow', 'qcow2', 'qed', 'raw', 'vdi', 'vhdx', + 'vmdk', 'vpc'], + supported_protocols=['file']) diff --git a/tests/qemu-iotests/tests/reopen-file.out b/tests/qemu-iotests/tests/reopen-file.out new file mode 100644 index 0000000000..ae1213e6f8 --- /dev/null +++ b/tests/qemu-iotests/tests/reopen-file.out @@ -0,0 +1,5 @@ +. +---------------------------------------------------------------------- +Ran 1 tests + +OK From 34145a307d849d0b6734d0222a7aa0bb9eef7407 Mon Sep 17 00:00:00 2001 From: Stefan Hajnoczi Date: Mon, 7 Mar 2022 15:38:51 +0000 Subject: [PATCH 11/13] coroutine-ucontext: use QEMU_DEFINE_STATIC_CO_TLS() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Thread-Local Storage variables cannot be used directly from coroutine code because the compiler may optimize TLS variable accesses across qemu_coroutine_yield() calls. When the coroutine is re-entered from another thread the TLS variables from the old thread must no longer be used. Use QEMU_DEFINE_STATIC_CO_TLS() for the current and leader variables. Signed-off-by: Stefan Hajnoczi Message-Id: <20220307153853.602859-2-stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Kevin Wolf --- util/coroutine-ucontext.c | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c index ed368e1a3e..ddc98fb4f8 100644 --- a/util/coroutine-ucontext.c +++ b/util/coroutine-ucontext.c @@ -25,6 +25,7 @@ #include "qemu/osdep.h" #include #include "qemu/coroutine_int.h" +#include "qemu/coroutine-tls.h" #ifdef CONFIG_VALGRIND_H #include @@ -66,8 +67,8 @@ typedef struct { /** * Per-thread coroutine bookkeeping */ -static __thread CoroutineUContext leader; -static __thread Coroutine *current; +QEMU_DEFINE_STATIC_CO_TLS(Coroutine *, current); +QEMU_DEFINE_STATIC_CO_TLS(CoroutineUContext, leader); /* * va_args to makecontext() must be type 'int', so passing @@ -97,14 +98,15 @@ static inline __attribute__((always_inline)) void finish_switch_fiber(void *fake_stack_save) { #ifdef CONFIG_ASAN + CoroutineUContext *leaderp = get_ptr_leader(); const void *bottom_old; size_t size_old; __sanitizer_finish_switch_fiber(fake_stack_save, &bottom_old, &size_old); - if (!leader.stack) { - leader.stack = (void *)bottom_old; - leader.stack_size = size_old; + if (!leaderp->stack) { + leaderp->stack = (void *)bottom_old; + leaderp->stack_size = size_old; } #endif #ifdef CONFIG_TSAN @@ -161,8 +163,10 @@ static void coroutine_trampoline(int i0, int i1) /* Initialize longjmp environment and switch back the caller */ if (!sigsetjmp(self->env, 0)) { - start_switch_fiber_asan(COROUTINE_YIELD, &fake_stack_save, leader.stack, - leader.stack_size); + CoroutineUContext *leaderp = get_ptr_leader(); + + start_switch_fiber_asan(COROUTINE_YIELD, &fake_stack_save, + leaderp->stack, leaderp->stack_size); start_switch_fiber_tsan(&fake_stack_save, self, true); /* true=caller */ siglongjmp(*(sigjmp_buf *)co->entry_arg, 1); } @@ -297,7 +301,7 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_, int ret; void *fake_stack_save = NULL; - current = to_; + set_current(to_); ret = sigsetjmp(from->env, 0); if (ret == 0) { @@ -315,18 +319,24 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_, Coroutine *qemu_coroutine_self(void) { - if (!current) { - current = &leader.base; + Coroutine *self = get_current(); + CoroutineUContext *leaderp = get_ptr_leader(); + + if (!self) { + self = &leaderp->base; + set_current(self); } #ifdef CONFIG_TSAN - if (!leader.tsan_co_fiber) { - leader.tsan_co_fiber = __tsan_get_current_fiber(); + if (!leaderp->tsan_co_fiber) { + leaderp->tsan_co_fiber = __tsan_get_current_fiber(); } #endif - return current; + return self; } bool qemu_in_coroutine(void) { - return current && current->caller; + Coroutine *self = get_current(); + + return self && self->caller; } From ac387a08a9c9f6b36757da912f0339c25f421f90 Mon Sep 17 00:00:00 2001 From: Stefan Hajnoczi Date: Mon, 7 Mar 2022 15:38:52 +0000 Subject: [PATCH 12/13] coroutine: use QEMU_DEFINE_STATIC_CO_TLS() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Thread-Local Storage variables cannot be used directly from coroutine code because the compiler may optimize TLS variable accesses across qemu_coroutine_yield() calls. When the coroutine is re-entered from another thread the TLS variables from the old thread must no longer be used. Use QEMU_DEFINE_STATIC_CO_TLS() for the current and leader variables. The alloc_pool QSLIST needs a typedef so the return value of get_ptr_alloc_pool() can be stored in a local variable. One example of why this code is necessary: a coroutine that yields before calling qemu_coroutine_create() to create another coroutine is affected by the TLS issue. Signed-off-by: Stefan Hajnoczi Message-Id: <20220307153853.602859-3-stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Kevin Wolf --- util/qemu-coroutine.c | 41 ++++++++++++++++++++++++----------------- 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c index c03b2422ff..f3e8300c8d 100644 --- a/util/qemu-coroutine.c +++ b/util/qemu-coroutine.c @@ -18,6 +18,7 @@ #include "qemu/atomic.h" #include "qemu/coroutine.h" #include "qemu/coroutine_int.h" +#include "qemu/coroutine-tls.h" #include "block/aio.h" /** Initial batch size is 64, and is increased on demand */ @@ -29,17 +30,20 @@ enum { static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool); static unsigned int pool_batch_size = POOL_INITIAL_BATCH_SIZE; static unsigned int release_pool_size; -static __thread QSLIST_HEAD(, Coroutine) alloc_pool = QSLIST_HEAD_INITIALIZER(pool); -static __thread unsigned int alloc_pool_size; -static __thread Notifier coroutine_pool_cleanup_notifier; + +typedef QSLIST_HEAD(, Coroutine) CoroutineQSList; +QEMU_DEFINE_STATIC_CO_TLS(CoroutineQSList, alloc_pool); +QEMU_DEFINE_STATIC_CO_TLS(unsigned int, alloc_pool_size); +QEMU_DEFINE_STATIC_CO_TLS(Notifier, coroutine_pool_cleanup_notifier); static void coroutine_pool_cleanup(Notifier *n, void *value) { Coroutine *co; Coroutine *tmp; + CoroutineQSList *alloc_pool = get_ptr_alloc_pool(); - QSLIST_FOREACH_SAFE(co, &alloc_pool, pool_next, tmp) { - QSLIST_REMOVE_HEAD(&alloc_pool, pool_next); + QSLIST_FOREACH_SAFE(co, alloc_pool, pool_next, tmp) { + QSLIST_REMOVE_HEAD(alloc_pool, pool_next); qemu_coroutine_delete(co); } } @@ -49,27 +53,30 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque) Coroutine *co = NULL; if (CONFIG_COROUTINE_POOL) { - co = QSLIST_FIRST(&alloc_pool); + CoroutineQSList *alloc_pool = get_ptr_alloc_pool(); + + co = QSLIST_FIRST(alloc_pool); if (!co) { if (release_pool_size > qatomic_read(&pool_batch_size)) { /* Slow path; a good place to register the destructor, too. */ - if (!coroutine_pool_cleanup_notifier.notify) { - coroutine_pool_cleanup_notifier.notify = coroutine_pool_cleanup; - qemu_thread_atexit_add(&coroutine_pool_cleanup_notifier); + Notifier *notifier = get_ptr_coroutine_pool_cleanup_notifier(); + if (!notifier->notify) { + notifier->notify = coroutine_pool_cleanup; + qemu_thread_atexit_add(notifier); } /* This is not exact; there could be a little skew between * release_pool_size and the actual size of release_pool. But * it is just a heuristic, it does not need to be perfect. */ - alloc_pool_size = qatomic_xchg(&release_pool_size, 0); - QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool); - co = QSLIST_FIRST(&alloc_pool); + set_alloc_pool_size(qatomic_xchg(&release_pool_size, 0)); + QSLIST_MOVE_ATOMIC(alloc_pool, &release_pool); + co = QSLIST_FIRST(alloc_pool); } } if (co) { - QSLIST_REMOVE_HEAD(&alloc_pool, pool_next); - alloc_pool_size--; + QSLIST_REMOVE_HEAD(alloc_pool, pool_next); + set_alloc_pool_size(get_alloc_pool_size() - 1); } } @@ -93,9 +100,9 @@ static void coroutine_delete(Coroutine *co) qatomic_inc(&release_pool_size); return; } - if (alloc_pool_size < qatomic_read(&pool_batch_size)) { - QSLIST_INSERT_HEAD(&alloc_pool, co, pool_next); - alloc_pool_size++; + if (get_alloc_pool_size() < qatomic_read(&pool_batch_size)) { + QSLIST_INSERT_HEAD(get_ptr_alloc_pool(), co, pool_next); + set_alloc_pool_size(get_alloc_pool_size() + 1); return; } } From c1fe694357a328c807ae3cc6961c19e923448fcc Mon Sep 17 00:00:00 2001 From: Stefan Hajnoczi Date: Mon, 7 Mar 2022 15:38:53 +0000 Subject: [PATCH 13/13] coroutine-win32: use QEMU_DEFINE_STATIC_CO_TLS() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Thread-Local Storage variables cannot be used directly from coroutine code because the compiler may optimize TLS variable accesses across qemu_coroutine_yield() calls. When the coroutine is re-entered from another thread the TLS variables from the old thread must no longer be used. Use QEMU_DEFINE_STATIC_CO_TLS() for the current and leader variables. I think coroutine-win32.c could get away with __thread because the variables are only used in situations where either the stale value is correct (current) or outside coroutine context (loading leader when current is NULL). Due to the difficulty of being sure that this is really safe in all scenarios it seems worth converting it anyway. Signed-off-by: Stefan Hajnoczi Message-Id: <20220307153853.602859-4-stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Kevin Wolf --- util/coroutine-win32.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/util/coroutine-win32.c b/util/coroutine-win32.c index c196f956d2..7db2e8f8c8 100644 --- a/util/coroutine-win32.c +++ b/util/coroutine-win32.c @@ -24,6 +24,7 @@ #include "qemu/osdep.h" #include "qemu/coroutine_int.h" +#include "qemu/coroutine-tls.h" typedef struct { @@ -33,8 +34,8 @@ typedef struct CoroutineAction action; } CoroutineWin32; -static __thread CoroutineWin32 leader; -static __thread Coroutine *current; +QEMU_DEFINE_STATIC_CO_TLS(CoroutineWin32, leader); +QEMU_DEFINE_STATIC_CO_TLS(Coroutine *, current); /* This function is marked noinline to prevent GCC from inlining it * into coroutine_trampoline(). If we allow it to do that then it @@ -51,7 +52,7 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_, CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, from_); CoroutineWin32 *to = DO_UPCAST(CoroutineWin32, base, to_); - current = to_; + set_current(to_); to->action = action; SwitchToFiber(to->fiber); @@ -88,14 +89,21 @@ void qemu_coroutine_delete(Coroutine *co_) Coroutine *qemu_coroutine_self(void) { + Coroutine *current = get_current(); + if (!current) { - current = &leader.base; - leader.fiber = ConvertThreadToFiber(NULL); + CoroutineWin32 *leader = get_ptr_leader(); + + current = &leader->base; + set_current(current); + leader->fiber = ConvertThreadToFiber(NULL); } return current; } bool qemu_in_coroutine(void) { + Coroutine *current = get_current(); + return current && current->caller; }