From b67de91e0d4cae883a038f31b2207a1aaef5e51c Mon Sep 17 00:00:00 2001 From: Bin Meng Date: Tue, 19 Jul 2022 21:50:14 +0800 Subject: [PATCH 01/14] docs: Add caveats for Windows as the build platform Commit cf60ccc3306c ("cutils: Introduce bundle mechanism") introduced a Python script to populate a bundle directory using os.symlink() to point to the binaries in the pc-bios directory of the source tree. Commit 882084a04ae9 ("datadir: Use bundle mechanism") removed previous logic in pc-bios/meson.build to create a link/copy of pc-bios binaries in the build tree so os.symlink() is the way to go. However os.symlink() may fail [1] on Windows if an unprivileged Windows user started the QEMU build process, which results in QEMU executables generated in the build tree not able to load the default BIOS/firmware images due to symbolic links not present in the bundle directory. This commits updates the documentation by adding such caveats for users who want to build QEMU on the Windows platform. [1] https://docs.python.org/3/library/os.html#os.symlink Signed-off-by: Bin Meng Reviewed-by: Stefan Weil Reviewed-by: Akihiko Odaki Message-Id: <20220719135014.764981-1-bmeng.cn@gmail.com> Signed-off-by: Paolo Bonzini --- docs/about/build-platforms.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/about/build-platforms.rst b/docs/about/build-platforms.rst index ebde20f981..6b8496c430 100644 --- a/docs/about/build-platforms.rst +++ b/docs/about/build-platforms.rst @@ -94,8 +94,16 @@ not tested anymore, so it is recommended to use one of the latest versions of Windows instead. The project supports building QEMU with current versions of the MinGW -toolchain, either hosted on Linux (Debian/Fedora) or via MSYS2 on Windows. +toolchain, either hosted on Linux (Debian/Fedora) or via `MSYS2`_ on Windows. +A more recent Windows version is always preferred as it is less likely to have +problems with building via MSYS2. The building process of QEMU involves some +Python scripts that call os.symlink() which needs special attention for the +build process to successfully complete. On newer versions of Windows 10, +unprivileged accounts can create symlinks if Developer Mode is enabled. +When Developer Mode is not available/enabled, the SeCreateSymbolicLinkPrivilege +privilege is required, or the process must be run as an administrator. .. _Homebrew: https://brew.sh/ .. _MacPorts: https://www.macports.org/ +.. _MSYS2: https://www.msys2.org/ .. _Repology: https://repology.org/ From d12dd9c7ee0ecab96efc21a263772fef2fff3ac6 Mon Sep 17 00:00:00 2001 From: Peter Maydell Date: Tue, 19 Jul 2022 14:48:53 +0100 Subject: [PATCH 02/14] accel/kvm: Avoid Coverity warning in query_stats() Coverity complains that there is a codepath in the query_stats() function where it can leak the memory pointed to by stats_list. This can only happen if the caller passes something other than STATS_TARGET_VM or STATS_TARGET_VCPU as the 'target', which no callsite does. Enforce this assumption using g_assert_not_reached(), so that if we have a future bug we hit the assert rather than silently leaking memory. Resolves: Coverity CID 1490140 Fixes: cc01a3f4cadd91e6 ("kvm: Support for querying fd-based stats") Signed-off-by: Peter Maydell Message-Id: <20220719134853.327059-1-peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini --- accel/kvm/kvm-all.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 99aede73b7..f165074e99 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -4014,7 +4014,7 @@ static void query_stats(StatsResultList **result, StatsTarget target, stats_list); break; default: - break; + g_assert_not_reached(); } } From d5b50236915be6f48e9ade9152273f0e902c63be Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 20 Jul 2022 10:40:06 +0200 Subject: [PATCH 03/14] oss-fuzz: remove binaries from qemu-bundle tree oss-fuzz is finding possible fuzzing targets even under qemu-bundle/.../bin, but they cannot be used because the required shared libraries are missing. Since the fuzzing targets are already placed manually in $OUT, the bindir and libexecdir subtrees are not needed; remove them. Cc: Alexander Bulekov Signed-off-by: Paolo Bonzini --- scripts/oss-fuzz/build.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scripts/oss-fuzz/build.sh b/scripts/oss-fuzz/build.sh index 2656a89aea..5ee9141e3e 100755 --- a/scripts/oss-fuzz/build.sh +++ b/scripts/oss-fuzz/build.sh @@ -87,8 +87,10 @@ if [ "$GITLAB_CI" != "true" ]; then make "-j$(nproc)" qemu-fuzz-i386 V=1 fi -# Prepare a preinstalled tree +# Place data files in the preinstall tree make install DESTDIR=$DEST_DIR/qemu-bundle +rm -rf $DEST_DIR/qemu-bundle/opt/qemu-oss-fuzz/bin +rm -rf $DEST_DIR/qemu-bundle/opt/qemu-oss-fuzz/libexec targets=$(./qemu-fuzz-i386 | awk '$1 ~ /\*/ {print $2}') base_copy="$DEST_DIR/qemu-fuzz-i386-target-$(echo "$targets" | head -n 1)" From 7906f11e62c39cdbf8edc274cb311a420d675371 Mon Sep 17 00:00:00 2001 From: Alexander Bulekov Date: Wed, 20 Jul 2022 14:09:46 -0400 Subject: [PATCH 04/14] oss-fuzz: ensure base_copy is a generic-fuzzer Depending on how the target list is sorted in by qemu, the first target (used as the base copy of the fuzzer, to which all others are linked) might not be a generic-fuzzer. Since we are trying to only use generic-fuzz, on oss-fuzz, fix that, to ensure the base copy is a generic-fuzzer. Signed-off-by: Alexander Bulekov Message-Id: <20220720180946.2264253-1-alxndr@bu.edu> Signed-off-by: Paolo Bonzini --- scripts/oss-fuzz/build.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/oss-fuzz/build.sh b/scripts/oss-fuzz/build.sh index 5ee9141e3e..3bda0d72c7 100755 --- a/scripts/oss-fuzz/build.sh +++ b/scripts/oss-fuzz/build.sh @@ -92,7 +92,7 @@ make install DESTDIR=$DEST_DIR/qemu-bundle rm -rf $DEST_DIR/qemu-bundle/opt/qemu-oss-fuzz/bin rm -rf $DEST_DIR/qemu-bundle/opt/qemu-oss-fuzz/libexec -targets=$(./qemu-fuzz-i386 | awk '$1 ~ /\*/ {print $2}') +targets=$(./qemu-fuzz-i386 | grep generic-fuzz | awk '$1 ~ /\*/ {print $2}') base_copy="$DEST_DIR/qemu-fuzz-i386-target-$(echo "$targets" | head -n 1)" cp "./qemu-fuzz-i386" "$base_copy" From 6b23a6791685ac1bf071c0ebb35004e427c1806d Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Tue, 19 Jul 2022 14:01:13 +0200 Subject: [PATCH 05/14] hw/nios2: virt: pass random seed to fdt If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to initialize early. Set this using the usual guest random number generation function. This FDT node is part of the DT specification. Cc: Chris Wulff Cc: Marek Vasut Signed-off-by: Jason A. Donenfeld Message-Id: <20220719120113.118034-1-Jason@zx2c4.com> Signed-off-by: Paolo Bonzini --- hw/nios2/boot.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/hw/nios2/boot.c b/hw/nios2/boot.c index 07b8d87633..21cbffff47 100644 --- a/hw/nios2/boot.c +++ b/hw/nios2/boot.c @@ -34,6 +34,7 @@ #include "qemu/option.h" #include "qemu/config-file.h" #include "qemu/error-report.h" +#include "qemu/guest-random.h" #include "sysemu/device_tree.h" #include "sysemu/reset.h" #include "hw/boards.h" @@ -83,6 +84,7 @@ static int nios2_load_dtb(struct nios2_boot_info bi, const uint32_t ramsize, int fdt_size; void *fdt = NULL; int r; + uint8_t rng_seed[32]; if (dtb_filename) { fdt = load_device_tree(dtb_filename, &fdt_size); @@ -91,6 +93,9 @@ static int nios2_load_dtb(struct nios2_boot_info bi, const uint32_t ramsize, return 0; } + qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed)); + qemu_fdt_setprop(fdt, "/chosen", "rng-seed", rng_seed, sizeof(rng_seed)); + if (kernel_cmdline) { r = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", kernel_cmdline); From 5e19cc68fb42c6ecabe5cf37012c887d25ffd144 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Tue, 19 Jul 2022 14:08:43 +0200 Subject: [PATCH 06/14] hw/mips: boston: pass random seed to fdt MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to initialize early. Set this using the usual guest random number generation function. This FDT node is part of the DT specification. I'd do the same for other MIPS platforms but boston is the only one that seems to use FDT. Cc: Paul Burton Cc: Aleksandar Rikalo Cc: Philippe Mathieu-Daudé Signed-off-by: Jason A. Donenfeld Message-Id: <20220719120843.134392-1-Jason@zx2c4.com> Signed-off-by: Paolo Bonzini --- hw/mips/boston.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/hw/mips/boston.c b/hw/mips/boston.c index 1debca18ec..d2ab9da1a0 100644 --- a/hw/mips/boston.c +++ b/hw/mips/boston.c @@ -34,6 +34,7 @@ #include "hw/qdev-properties.h" #include "qapi/error.h" #include "qemu/error-report.h" +#include "qemu/guest-random.h" #include "qemu/log.h" #include "chardev/char.h" #include "sysemu/device_tree.h" @@ -363,6 +364,7 @@ static const void *boston_fdt_filter(void *opaque, const void *fdt_orig, size_t ram_low_sz, ram_high_sz; size_t fdt_sz = fdt_totalsize(fdt_orig) * 2; g_autofree void *fdt = g_malloc0(fdt_sz); + uint8_t rng_seed[32]; err = fdt_open_into(fdt_orig, fdt, fdt_sz); if (err) { @@ -370,6 +372,9 @@ static const void *boston_fdt_filter(void *opaque, const void *fdt_orig, return NULL; } + qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed)); + qemu_fdt_setprop(fdt, "/chosen", "rng-seed", rng_seed, sizeof(rng_seed)); + cmdline = (machine->kernel_cmdline && machine->kernel_cmdline[0]) ? machine->kernel_cmdline : " "; err = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline); From c287941a4dd5570e4221b0a58590f7231f896e51 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Tue, 19 Jul 2022 14:20:33 +0200 Subject: [PATCH 07/14] hw/rx: pass random seed to fdt If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to initialize early. Set this using the usual guest random number generation function. This FDT node is part of the DT specification. Cc: Yoshinori Sato Signed-off-by: Jason A. Donenfeld Message-Id: <20220719122033.135902-1-Jason@zx2c4.com> Signed-off-by: Paolo Bonzini --- hw/rx/rx-gdbsim.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c index be147b4bd9..8ffe1b8035 100644 --- a/hw/rx/rx-gdbsim.c +++ b/hw/rx/rx-gdbsim.c @@ -19,6 +19,7 @@ #include "qemu/osdep.h" #include "qemu/cutils.h" #include "qemu/error-report.h" +#include "qemu/guest-random.h" #include "qapi/error.h" #include "hw/loader.h" #include "hw/rx/rx62n.h" @@ -83,6 +84,7 @@ static void rx_gdbsim_init(MachineState *machine) MemoryRegion *sysmem = get_system_memory(); const char *kernel_filename = machine->kernel_filename; const char *dtb_filename = machine->dtb; + uint8_t rng_seed[32]; if (machine->ram_size < mc->default_ram_size) { char *sz = size_to_str(mc->default_ram_size); @@ -140,6 +142,8 @@ static void rx_gdbsim_init(MachineState *machine) error_report("Couldn't set /chosen/bootargs"); exit(1); } + qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed)); + qemu_fdt_setprop(dtb, "/chosen", "rng-seed", rng_seed, sizeof(rng_seed)); /* DTB is located at the end of SDRAM space. */ dtb_offset = ROUND_DOWN(machine->ram_size - dtb_size, 16); rom_add_blob_fixed("dtb", dtb, dtb_size, From 67f7e426e53833a5db75b0d813e8d537b8a75bd2 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 21 Jul 2022 14:56:36 +0200 Subject: [PATCH 08/14] hw/i386: pass RNG seed via setup_data entry MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Tiny machines optimized for fast boot time generally don't use EFI, which means a random seed has to be supplied some other way. For this purpose, Linux (≥5.20) supports passing a seed in the setup_data table with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and specialized bootloaders. The linked commit shows the upstream kernel implementation. At Paolo's request, we don't pass these to versioned machine types ≤7.0. Link: https://git.kernel.org/tip/tip/c/68b8e9713c8 Cc: Marcel Apfelbaum Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Peter Maydell Cc: Philippe Mathieu-Daudé Cc: Laurent Vivier Reviewed-by: Michael S. Tsirkin Signed-off-by: Jason A. Donenfeld Message-Id: <20220721125636.446842-1-Jason@zx2c4.com> Signed-off-by: Paolo Bonzini --- hw/i386/microvm.c | 2 +- hw/i386/pc.c | 4 +-- hw/i386/pc_piix.c | 2 ++ hw/i386/pc_q35.c | 2 ++ hw/i386/x86.c | 26 +++++++++++++++++--- include/hw/i386/pc.h | 3 +++ include/hw/i386/x86.h | 3 ++- include/standard-headers/asm-x86/bootparam.h | 1 + 8 files changed, 35 insertions(+), 8 deletions(-) diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c index dc929727dc..7fe8cce03e 100644 --- a/hw/i386/microvm.c +++ b/hw/i386/microvm.c @@ -332,7 +332,7 @@ static void microvm_memory_init(MicrovmMachineState *mms) rom_set_fw(fw_cfg); if (machine->kernel_filename != NULL) { - x86_load_linux(x86ms, fw_cfg, 0, true); + x86_load_linux(x86ms, fw_cfg, 0, true, false); } if (mms->option_roms) { diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 774cb2bf07..d2b5823ffb 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -796,7 +796,7 @@ void xen_load_linux(PCMachineState *pcms) rom_set_fw(fw_cfg); x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size, - pcmc->pvh_enabled); + pcmc->pvh_enabled, pcmc->legacy_no_rng_seed); for (i = 0; i < nb_option_roms; i++) { assert(!strcmp(option_rom[i].name, "linuxboot.bin") || !strcmp(option_rom[i].name, "linuxboot_dma.bin") || @@ -992,7 +992,7 @@ void pc_memory_init(PCMachineState *pcms, if (linux_boot) { x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size, - pcmc->pvh_enabled); + pcmc->pvh_enabled, pcmc->legacy_no_rng_seed); } for (i = 0; i < nb_option_roms; i++) { diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index a234989ac3..fbf9465318 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -438,9 +438,11 @@ DEFINE_I440FX_MACHINE(v7_1, "pc-i440fx-7.1", NULL, static void pc_i440fx_7_0_machine_options(MachineClass *m) { + PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_i440fx_7_1_machine_options(m); m->alias = NULL; m->is_default = false; + pcmc->legacy_no_rng_seed = true; compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len); compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len); } diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index f96cbd04e2..12cc76aaf8 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -375,8 +375,10 @@ DEFINE_Q35_MACHINE(v7_1, "pc-q35-7.1", NULL, static void pc_q35_7_0_machine_options(MachineClass *m) { + PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_q35_7_1_machine_options(m); m->alias = NULL; + pcmc->legacy_no_rng_seed = true; compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len); compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len); } diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 6003b4b2df..ecea25d249 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -26,6 +26,7 @@ #include "qemu/cutils.h" #include "qemu/units.h" #include "qemu/datadir.h" +#include "qemu/guest-random.h" #include "qapi/error.h" #include "qapi/qmp/qerror.h" #include "qapi/qapi-visit-common.h" @@ -766,7 +767,8 @@ static bool load_elfboot(const char *kernel_filename, void x86_load_linux(X86MachineState *x86ms, FWCfgState *fw_cfg, int acpi_data_size, - bool pvh_enabled) + bool pvh_enabled, + bool legacy_no_rng_seed) { bool linuxboot_dma_enabled = X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled; uint16_t protocol; @@ -774,7 +776,7 @@ void x86_load_linux(X86MachineState *x86ms, int dtb_size, setup_data_offset; uint32_t initrd_max; uint8_t header[8192], *setup, *kernel; - hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0; + hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, first_setup_data = 0; FILE *f; char *vmode; MachineState *machine = MACHINE(x86ms); @@ -784,6 +786,7 @@ void x86_load_linux(X86MachineState *x86ms, const char *dtb_filename = machine->dtb; const char *kernel_cmdline = machine->kernel_cmdline; SevKernelLoaderContext sev_load_ctx = {}; + enum { RNG_SEED_LENGTH = 32 }; /* Align to 16 bytes as a paranoia measure */ cmdline_size = (strlen(kernel_cmdline) + 16) & ~15; @@ -1063,16 +1066,31 @@ void x86_load_linux(X86MachineState *x86ms, kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size; kernel = g_realloc(kernel, kernel_size); - stq_p(header + 0x250, prot_addr + setup_data_offset); setup_data = (struct setup_data *)(kernel + setup_data_offset); - setup_data->next = 0; + setup_data->next = cpu_to_le64(first_setup_data); + first_setup_data = prot_addr + setup_data_offset; setup_data->type = cpu_to_le32(SETUP_DTB); setup_data->len = cpu_to_le32(dtb_size); load_image_size(dtb_filename, setup_data->data, dtb_size); } + if (!legacy_no_rng_seed) { + setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16); + kernel_size = setup_data_offset + sizeof(struct setup_data) + RNG_SEED_LENGTH; + kernel = g_realloc(kernel, kernel_size); + setup_data = (struct setup_data *)(kernel + setup_data_offset); + setup_data->next = cpu_to_le64(first_setup_data); + first_setup_data = prot_addr + setup_data_offset; + setup_data->type = cpu_to_le32(SETUP_RNG_SEED); + setup_data->len = cpu_to_le32(RNG_SEED_LENGTH); + qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH); + } + + /* Offset 0x250 is a pointer to the first setup_data link. */ + stq_p(header + 0x250, first_setup_data); + /* * If we're starting an encrypted VM, it will be OVMF based, which uses the * efi stub for booting and doesn't require any values to be placed in the diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index b7735dccfc..2a8ffbcfa8 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -127,6 +127,9 @@ struct PCMachineClass { /* create kvmclock device even when KVM PV features are not exposed */ bool kvmclock_create_always; + + /* skip passing an rng seed for legacy machines */ + bool legacy_no_rng_seed; }; #define TYPE_PC_MACHINE "generic-pc-machine" diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h index 9089bdd99c..6bdf1f6ab2 100644 --- a/include/hw/i386/x86.h +++ b/include/hw/i386/x86.h @@ -123,7 +123,8 @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware, void x86_load_linux(X86MachineState *x86ms, FWCfgState *fw_cfg, int acpi_data_size, - bool pvh_enabled); + bool pvh_enabled, + bool legacy_no_rng_seed); bool x86_machine_is_smm_enabled(const X86MachineState *x86ms); bool x86_machine_is_acpi_enabled(const X86MachineState *x86ms); diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h index 072e2ed546..b2aaad10e5 100644 --- a/include/standard-headers/asm-x86/bootparam.h +++ b/include/standard-headers/asm-x86/bootparam.h @@ -10,6 +10,7 @@ #define SETUP_EFI 4 #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE 6 +#define SETUP_RNG_SEED 9 #define SETUP_INDIRECT (1<<31) From bd4b7fd6ba98e2d660356f1f52edcec5c51b0991 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Mon, 18 Jul 2022 18:40:43 +0200 Subject: [PATCH 09/14] linux-user/hppa: Fix segfaults on page zero This program: int main(void) { asm("bv %r0(%r0)"); return 0; } produces on real hppa hardware the expected segfault: SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x3} --- killed by SIGSEGV +++ Segmentation fault But when run on linux-user you get instead internal qemu errors: ERROR: linux-user/hppa/cpu_loop.c:172:cpu_loop: code should not be reached Bail out! ERROR: linux-user/hppa/cpu_loop.c:172:cpu_loop: code should not be reached ERROR: accel/tcg/cpu-exec.c:933:cpu_exec: assertion failed: (cpu == current_cpu) Bail out! ERROR: accel/tcg/cpu-exec.c:933:cpu_exec: assertion failed: (cpu == current_cpu) Fix it by adding the missing case for the EXCP_IMP trap in cpu_loop() and raise a segfault. Signed-off-by: Helge Deller Reviewed-by: Peter Maydell Message-Id: Signed-off-by: Laurent Vivier --- linux-user/hppa/cpu_loop.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/linux-user/hppa/cpu_loop.c b/linux-user/hppa/cpu_loop.c index a576d1a249..64263c3dc4 100644 --- a/linux-user/hppa/cpu_loop.c +++ b/linux-user/hppa/cpu_loop.c @@ -143,6 +143,9 @@ void cpu_loop(CPUHPPAState *env) env->iaoq_f = env->gr[31]; env->iaoq_b = env->gr[31] + 4; break; + case EXCP_IMP: + force_sig_fault(TARGET_SIGSEGV, TARGET_SEGV_MAPERR, env->iaoq_f); + break; case EXCP_ILL: force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPN, env->iaoq_f); break; From 499d8055379f5beb2ca155c668eca52b8a24321a Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Tue, 19 Jul 2022 18:20:42 +0200 Subject: [PATCH 10/14] linux-user: Unconditionally use pipe2() syscall The pipe2() syscall is available on all Linux platforms since kernel 2.6.27, so use it unconditionally to emulate pipe() and pipe2(). Signed-off-by: Helge Deller Reviewed-by: Peter Maydell Message-Id: Signed-off-by: Laurent Vivier --- linux-user/syscall.c | 11 +---------- meson.build | 9 --------- 2 files changed, 1 insertion(+), 19 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 991b85e6b4..4f89184d05 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -1586,21 +1586,12 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, abi_long arg3, } #endif -static abi_long do_pipe2(int host_pipe[], int flags) -{ -#ifdef CONFIG_PIPE2 - return pipe2(host_pipe, flags); -#else - return -ENOSYS; -#endif -} - static abi_long do_pipe(CPUArchState *cpu_env, abi_ulong pipedes, int flags, int is_pipe2) { int host_pipe[2]; abi_long ret; - ret = flags ? do_pipe2(host_pipe, flags) : pipe(host_pipe); + ret = pipe2(host_pipe, flags); if (is_error(ret)) return get_errno(ret); diff --git a/meson.build b/meson.build index 8a8c415fc1..75aaca8462 100644 --- a/meson.build +++ b/meson.build @@ -2026,15 +2026,6 @@ config_host_data.set('CONFIG_OPEN_BY_HANDLE', cc.links(gnu_source_prefix + ''' #else int main(void) { struct file_handle fh; return open_by_handle_at(0, &fh, 0); } #endif''')) -config_host_data.set('CONFIG_PIPE2', cc.links(gnu_source_prefix + ''' - #include - #include - - int main(void) - { - int pipefd[2]; - return pipe2(pipefd, O_CLOEXEC); - }''')) config_host_data.set('CONFIG_POSIX_MADVISE', cc.links(gnu_source_prefix + ''' #include #include From 6f200f51869ff0de7ea0343dd7104362e994b382 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Sun, 17 Jul 2022 18:21:53 +0200 Subject: [PATCH 11/14] linux-user: Use target abi_int type for pipefd[1] in pipe() When writing back the fd[1] pipe file handle to emulated userspace memory, use sizeof(abi_int) as offset insted of the hosts's int type. There is no functional change in this patch. Signed-off-by: Helge Deller Reviewed-by: Richard Henderson Message-Id: Signed-off-by: Laurent Vivier --- linux-user/syscall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 4f89184d05..b27a6552aa 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -1615,7 +1615,7 @@ static abi_long do_pipe(CPUArchState *cpu_env, abi_ulong pipedes, } if (put_user_s32(host_pipe[0], pipedes) - || put_user_s32(host_pipe[1], pipedes + sizeof(host_pipe[0]))) + || put_user_s32(host_pipe[1], pipedes + sizeof(abi_int))) return -TARGET_EFAULT; return get_errno(ret); } From dd0ef128669c29734a197ca9195e7ab64e20ba2c Mon Sep 17 00:00:00 2001 From: Ake Koomsin Date: Wed, 20 Jul 2022 20:13:03 +0900 Subject: [PATCH 12/14] e1000e: Fix possible interrupt loss when using MSI Commit "e1000e: Prevent MSI/MSI-X storms" introduced msi_causes_pending to prevent interrupt storms problem. It was tested with MSI-X. In case of MSI, the guest can rely solely on interrupts to clear ICR. Upon clearing all pending interrupts, msi_causes_pending gets cleared. However, when e1000e_itr_should_postpone() in e1000e_send_msi() returns true, MSI never gets fired by e1000e_intrmgr_on_throttling_timer() because msi_causes_pending is still set. This results in interrupt loss. To prevent this, we need to clear msi_causes_pending when MSI is going to get fired by the throttling timer. The guest can then receive interrupts eventually. Signed-off-by: Ake Koomsin Signed-off-by: Jason Wang --- hw/net/e1000e_core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 2c51089a82..208e3e0d79 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -159,6 +159,8 @@ e1000e_intrmgr_on_throttling_timer(void *opaque) if (msi_enabled(timer->core->owner)) { trace_e1000e_irq_msi_notify_postponed(); + /* Clear msi_causes_pending to fire MSI eventually */ + timer->core->msi_causes_pending = 0; e1000e_set_interrupt_cause(timer->core, 0); } else { trace_e1000e_irq_legacy_notify_postponed(); From 2fdac348fd3d243bb964937236af3cc27ae7af2b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= Date: Mon, 18 Jul 2022 14:05:45 +0200 Subject: [PATCH 13/14] vhost: Get vring base from vq, not svq MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The SVQ vring used idx usually match with the guest visible one, as long as all the guest buffers (GPA) maps to exactly one buffer within qemu's VA. However, as we can see in virtqueue_map_desc, a single guest buffer could map to many buffers in SVQ vring. Also, its also a mistake to rewind them at the source of migration. Since VirtQueue is able to migrate the inflight descriptors, its responsability of the destination to perform the rewind just in case it cannot report the inflight descriptors to the device. This makes easier to migrate between backends or to recover them in vhost devices that support set in flight descriptors. Fixes: 6d0b22266633 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ") Signed-off-by: Eugenio Pérez Signed-off-by: Jason Wang --- hw/virtio/vhost-vdpa.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 291cd19054..bce64f45c2 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -1179,7 +1179,18 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev, struct vhost_vring_state *ring) { struct vhost_vdpa *v = dev->opaque; + VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index); + /* + * vhost-vdpa devices does not support in-flight requests. Set all of them + * as available. + * + * TODO: This is ok for networking, but other kinds of devices might + * have problems with these retransmissions. + */ + while (virtqueue_rewind(vq, 1)) { + continue; + } if (v->shadow_vqs_enabled) { /* * Device vring base was set at device start. SVQ base is handled by @@ -1195,21 +1206,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev, struct vhost_vring_state *ring) { struct vhost_vdpa *v = dev->opaque; - int vdpa_idx = ring->index - dev->vq_index; int ret; if (v->shadow_vqs_enabled) { - VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx); - - /* - * Setting base as last used idx, so destination will see as available - * all the entries that the device did not use, including the in-flight - * processing ones. - * - * TODO: This is ok for networking, but other kinds of devices might - * have problems with these retransmissions. - */ - ring->num = svq->last_used_idx; + ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index); return 0; } From 75a8ce64f6e37513698857fb4284170da163ed06 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= Date: Fri, 22 Jul 2022 10:26:30 +0200 Subject: [PATCH 14/14] vdpa: Fix memory listener deletions of iova tree MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit vhost_vdpa_listener_region_del is always deleting the first iova entry of the tree, since it's using the needle iova instead of the result's one. This was detected using a vga virtual device in the VM using vdpa SVQ. It makes some extra memory adding and deleting, so the wrong one was mapped / unmapped. This was undetected before since all the memory was mappend and unmapped totally without that device, but other conditions could trigger it too: * mem_region was with .iova = 0, .translated_addr = (correct GPA). * iova_tree_find_iova returned right result, but does not update mem_region. * iova_tree_remove always removed region with .iova = 0. Right iova were sent to the device. * Next map will fill the first region with .iova = 0, causing a mapping with the same iova and device complains, if the next action is a map. * Next unmap will cause to try to unmap again iova = 0, causing the device to complain that no region was mapped at iova = 0. Fixes: 34e3c94edaef ("vdpa: Add custom IOTLB translations to SVQ") Reported-by: Lei Yang Signed-off-by: Eugenio Pérez Signed-off-by: Jason Wang --- hw/virtio/vhost-vdpa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index bce64f45c2..3ff9ce3501 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -290,7 +290,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener, result = vhost_iova_tree_find_iova(v->iova_tree, &mem_region); iova = result->iova; - vhost_iova_tree_remove(v->iova_tree, &mem_region); + vhost_iova_tree_remove(v->iova_tree, result); } vhost_vdpa_iotlb_batch_begin_once(v); ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));