Full system hooks (#8)

* scsi-disk: add new quirks bitmap to SCSIDiskState Since the MacOS SCSI implementation is quite old (and Apple added some firmware customisations to their drives for m68k Macs) there is need to add a mechanism to correctly handle Apple-specific quirks. Add a new quirks bitmap to SCSIDiskState that can be used to enable these features as required. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220622105314.802852-2-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * scsi-disk: add MODE_PAGE_APPLE_VENDOR quirk for Macintosh One of the mechanisms MacOS uses to identify CDROM drives compatible with MacOS is to send a custom MODE SELECT command for page 0x30 to the drive. The response to this is a hard-coded manufacturer string which must match in order for the CDROM to be usable within MacOS. Add an implementation of the MODE SELECT page 0x30 response guarded by a newly defined SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR quirk bit so that CDROM drives attached to non-Apple machines function exactly as before. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220622105314.802852-3-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * q800: implement compat_props to enable quirk_mode_page_apple_vendor for scsi-cd devices By default quirk_mode_page_apple_vendor should be enabled for all scsi-cd devices connected to the q800 machine to enable MacOS to detect and use them. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20220622105314.802852-4-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * scsi-disk: add SCSI_DISK_QUIRK_MODE_SENSE_ROM_USE_DBD quirk for Macintosh During SCSI bus enumeration A/UX sends a MODE SENSE command to the CDROM with the DBD bit unset and expects the response to include a block descriptor. As per the latest SCSI documentation, QEMU currently force-disables the block descriptor for CDROM devices but the A/UX driver expects the requested block descriptor to be returned. If the block descriptor is not returned in the response then A/UX becomes confused, since the block descriptor returned in the MODE SENSE response is used to generate a subsequent MODE SELECT command which is then invalid. Add a new SCSI_DISK_QUIRK_MODE_SENSE_ROM_USE_DBD quirk to allow this behaviour to be enabled as required. Note that an additional workaround is required for the previous SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR quirk which must never return a block descriptor even though the DBD bit is left unset. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20220622105314.802852-5-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * q800: implement compat_props to enable quirk_mode_sense_rom_use_dbd for scsi-cd devices By default quirk_mode_sense_rom_use_dbd should be enabled for all scsi-cd devices connected to the q800 machine to correctly report the CDROM block descriptor back to A/UX. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220622105314.802852-6-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * scsi-disk: add SCSI_DISK_QUIRK_MODE_PAGE_VENDOR_SPECIFIC_APPLE quirk for Macintosh Both MacOS and A/UX make use of vendor-specific MODE SELECT commands with PF=0 to identify SCSI devices: - MacOS sends a MODE SELECT command with PF=0 for the MODE_PAGE_VENDOR_SPECIFIC (0x0) mode page containing 2 bytes before initialising a disk - A/UX (installed on disk) sends a MODE SELECT command with PF=0 during SCSI bus enumeration, and gets stuck in an infinite loop if it fails Add a new SCSI_DISK_QUIRK_MODE_PAGE_VENDOR_SPECIFIC_APPLE quirk to allow both PF=0 MODE SELECT commands and implement a MODE_PAGE_VENDOR_SPECIFIC (0x0) mode page which is compatible with MacOS. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20220622105314.802852-7-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * q800: implement compat_props to enable quirk_mode_page_vendor_specific_apple for scsi devices By default quirk_mode_page_vendor_specific_apple should be enabled for both scsi-hd and scsi-cd devices to allow MacOS to format SCSI disk devices, and A/UX to enumerate SCSI CDROM devices succesfully without getting stuck in a loop. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20220622105314.802852-8-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * scsi-disk: add FORMAT UNIT command When initialising a drive ready to install MacOS, Apple HD SC Setup first attempts to format the drive. Add a simple FORMAT UNIT command which simply returns success to allow the format to succeed. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220622105314.802852-9-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * scsi-disk: add SCSI_DISK_QUIRK_MODE_PAGE_TRUNCATED quirk for Macintosh When A/UX configures the CDROM device it sends a truncated MODE SELECT request for page 1 (MODE_PAGE_R_W_ERROR) which is only 6 bytes in length rather than 10. This seems to be due to bug in Apple's code which calculates the CDB message length incorrectly. The work at [1] suggests that this truncated request is accepted on real hardware whereas in QEMU it generates an INVALID_PARAM_LEN sense code which causes A/UX to get stuck in a loop retrying the command in an attempt to succeed. Alter the mode page request length check so that truncated requests are allowed if the SCSI_DISK_QUIRK_MODE_PAGE_TRUNCATED quirk is enabled, whilst also adding a trace event to enable the condition to be detected. [1] https://68kmla.org/bb/index.php?threads/scsi2sd-project-anyone-interested.29040/page-7#post-316444 Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20220622105314.802852-10-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * q800: implement compat_props to enable quirk_mode_page_truncated for scsi-cd devices By default quirk_mode_page_truncated should be enabled for all scsi-cd devices connected to the q800 machine to allow A/UX to enumerate SCSI CDROM devices without hanging. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20220622105314.802852-11-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * scsi-disk: allow the MODE_PAGE_R_W_ERROR AWRE bit to be changeable for CDROM drives A/UX sends a MODE_PAGE_R_W_ERROR command with the AWRE bit set to 0 when enumerating CDROM drives. Since the bit is currently hardcoded to 1 then indicate that the AWRE bit can be changed (even though we don't care about the value) so that the MODE_PAGE_R_W_ERROR page can be set successfully. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20220622105314.802852-12-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * scsi-disk: allow MODE SELECT block descriptor to set the block size The MODE SELECT command can contain an optional block descriptor that can be used to set the device block size. If the block descriptor is present then update the block size on the SCSI device accordingly. This allows CDROMs to be used with A/UX which requires a CDROM drive which is capable of switching from a 2048 byte sector size to a 512 byte sector size. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20220622105314.802852-13-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * q800: add default vendor and product information for scsi-hd devices The Apple HD SC Setup program uses a SCSI INQUIRY command to check that any SCSI hard disks detected match a whitelist of vendors and products before allowing the "Initialise" button to prepare an empty disk. Add known-good default vendor and product information using the existing compat_prop mechanism so the user doesn't have to use long command lines to set the qdev properties manually. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220622105314.802852-14-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * q800: add default vendor and product information for scsi-cd devices The MacOS CDROM driver uses a SCSI INQUIRY command to check that any SCSI CDROMs detected match a whitelist of vendors and products before adding them to the list of available devices. Add known-good default vendor and product information using the existing compat_prop mechanism so the user doesn't have to use long command lines to set the qdev properties manually. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220622105314.802852-15-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * pc-bios/s390-ccw: add -Wno-array-bounds The option generates a lot of warnings for integers casted to pointers, for example: /home/pbonzini/work/upstream/qemu/pc-bios/s390-ccw/dasd-ipl.c:174:19: warning: array subscript 0 is outside array bounds of ‘CcwSeekData[0]’ [-Warray-bounds] 174 | seekData->cyl = 0x00; | ~~~~~~~~~~~~~~^~~~~~ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * aspeed: sbc: Allow per-machine settings In order to correctly report secure boot running firmware the values of certain registers must be set. We don't yet have documentation from ASPEED on what they mean. The meaning is inferred from u-boot's use of them. Introduce properties so the settings can be configured per-machine. Reviewed-by: Peter Delevoryas <pdel@fb.com> Tested-by: Peter Delevoryas <pdel@fb.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Message-Id: <20220628154740.1117349-4-clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> * hw/i2c/pmbus: Add idle state to return 0xff's Signed-off-by: Peter Delevoryas <pdel@fb.com> Reviewed-by: Titus Rwantare <titusr@google.com> Message-Id: <20220701000626.77395-2-me@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * hw/sensor: Add IC_DEVICE_ID to ISL voltage regulators This commit adds a passthrough for PMBUS_IC_DEVICE_ID to allow Renesas voltage regulators to return the integrated circuit device ID if they would like to. The behavior is very device specific, so it hasn't been added to the general PMBUS model. Additionally, if the device ID hasn't been set, then the voltage regulator will respond with the error byte value. The guest error message will change slightly for IC_DEVICE_ID with this commit. Signed-off-by: Peter Delevoryas <pdel@fb.com> Reviewed-by: Titus Rwantare <titusr@google.com> Message-Id: <20220701000626.77395-3-me@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * hw/sensor: Add Renesas ISL69259 device model This adds the ISL69259, using all the same functionality as the existing ISL69260 but overriding the IC_DEVICE_ID. Signed-off-by: Peter Delevoryas <pdel@fb.com> Reviewed-by: Titus Rwantare <titusr@google.com> Message-Id: <20220701000626.77395-4-me@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * aspeed: Create SRAM name from first CPU index To support multiple SoC's running simultaneously, we need a unique name for each RAM region. DRAM is created by the machine, but SRAM is created by the SoC, since in hardware it is part of the SoC's internals. We need a way to uniquely identify each SRAM region though, for VM migration. Since each of the SoC's CPU's has an index which identifies it uniquely from other CPU's in the machine, we can use the index of any of the CPU's in the SoC to uniquely identify differentiate the SRAM name from other SoC SRAM's. In this change, I just elected to use the index of the first CPU in each SoC. Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220705191400.41632-3-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * aspeed: Refactor UART init for multi-SoC machines This change moves the code that connects the SoC UART's to serial_hd's to the machine. It makes each UART a proper child member of the SoC, and then allows the machine to selectively initialize the chardev for each UART with a serial_hd. This should preserve backwards compatibility, but also allow multi-SoC boards to completely change the wiring of serial devices from the command line to specific SoC UART's. This also removes the uart-default property from the SoC, since the SoC doesn't need to know what UART is the "default" on the machine anymore. I tested this using the images and commands from the previous refactoring, and another test image for the ast1030: wget https://github.com/facebook/openbmc/releases/download/v2021.49.0/fuji.mtd wget https://github.com/facebook/openbmc/releases/download/v2021.49.0/wedge100.mtd wget https://github.com/peterdelevoryas/OpenBIC/releases/download/oby35-cl-2022.13.01/Y35BCL.elf Fuji uses UART1: qemu-system-arm -machine fuji-bmc \ -drive file=fuji.mtd,format=raw,if=mtd \ -nographic ast2600-evb uses uart-default=UART5: qemu-system-arm -machine ast2600-evb \ -drive file=fuji.mtd,format=raw,if=mtd \ -serial null -serial mon:stdio -display none Wedge100 uses UART3: qemu-system-arm -machine palmetto-bmc \ -drive file=wedge100.mtd,format=raw,if=mtd \ -serial null -serial null -serial null \ -serial mon:stdio -display none AST1030 EVB uses UART5: qemu-system-arm -machine ast1030-evb \ -kernel Y35BCL.elf -nographic Fixes: 6827ff20b2975 ("hw: aspeed: Init all UART's with serial devices") Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220705191400.41632-4-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * aspeed: Make aspeed_board_init_flashes public Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220705191400.41632-5-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * aspeed: Add fby35 skeleton Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220705191400.41632-6-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * aspeed: Add AST2600 (BMC) to fby35 You can test booting the BMC with both '-device loader' and '-drive file'. This is necessary because of how the fb-openbmc boot sequence works (jump to 0x20000000 after U-Boot SPL). wget https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd qemu-system-arm -machine fby35 -nographic \ -device loader,file=fby35.mtd,addr=0,cpu-num=0 -drive file=fby35.mtd,format=raw,if=mtd Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220705191400.41632-7-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * aspeed: fby35: Add a bootrom for the BMC The BMC boots from the first flash device by fetching instructions from the flash contents. Add an alias region on 0x0 for this purpose. There are currently performance issues with this method (TBs being flushed too often), so as a faster alternative, install the flash contents as a ROM in the BMC memory space. See commit 1a15311a12fa ("hw/arm/aspeed: add a 'execute-in-place' property to boot directly from CE0") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Peter Delevoryas <peter@pjd.dev> [ clg: blk_pread() fixes ] Message-Id: <20220705191400.41632-8-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * aspeed: Add AST1030 (BIC) to fby35 With the BIC, the easiest way to run everything is to create two pty's for each SoC and reserve stdin/stdout for the monitor: wget https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd wget https://github.com/peterdelevoryas/OpenBIC/releases/download/oby35-cl-2022.13.01/Y35BCL.elf qemu-system-arm -machine fby35 \ -drive file=fby35.mtd,format=raw,if=mtd \ -device loader,file=fby35.mtd,addr=0,cpu-num=0 \ -serial pty -serial pty -serial mon:stdio -display none -S screen /dev/ttys0 screen /dev/ttys1 (qemu) c This commit only adds the the first server board's Bridge IC, but in the future we'll try to include the other three server board Bridge IC's too. Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220705191400.41632-9-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * docs: aspeed: Add fby35 multi-SoC machine section Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> [ clg: - fixed URL links - Moved Facebook Yosemite section at the end of the file ] Message-Id: <20220705191400.41632-10-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * docs: aspeed: Minor updates Some more controllers have been modeled recently. Reflect that in the list of supported devices. New machines were also added. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-Id: <20220706172131.809255-1-clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> * test/avocado/machine_aspeed.py: Add SDK tests The Aspeed SDK kernel usually includes support for the lastest HW features. This is interesting to exercise QEMU and discover the gaps in the models. Add extra I2C tests for the AST2600 EVB machine to check the new register interface. Message-Id: <20220707091239.1029561-1-clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> * hw: m25p80: Add Block Protect and Top Bottom bits for write protect Signed-off-by: Iris Chen <irischenlj@fb.com> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Message-Id: <20220708164552.3462620-1-irischenlj@fb.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> * hw: m25p80: add tests for BP and TB bit write protect Signed-off-by: Iris Chen <irischenlj@fb.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220627185234.1911337-3-irischenlj@fb.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> * qtest/aspeed_gpio: Add input pin modification test Verify the current behavior, which is that input pins can be modified by guest OS register writes. Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220712023219.41065-2-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * hw/gpio/aspeed: Don't let guests modify input pins Up until now, guests could modify input pins by overwriting the data value register. The guest OS should only be allowed to modify output pin values, and the QOM property setter should only be permitted to modify input pins. This change also updates the gpio input pin test to match this expectation. Andrew suggested this particularly refactoring here: https://lore.kernel.org/qemu-devel/23523aa1-ba81-412b-92cc-8174faba3612@www.fastmail.com/ Suggested-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Peter Delevoryas <peter@pjd.dev> Fixes: 4b7f956862dc ("hw/gpio: Add basic Aspeed GPIO model for AST2400 and AST2500") Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220712023219.41065-3-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * aspeed: Add fby35-bmc slot GPIO's Signed-off-by: Peter Delevoryas <peter@pjd.dev> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220712023219.41065-4-peter@pjd.dev> Signed-off-by: Cédric Le Goater <clg@kaod.org> * hw/nvme: Implement shadow doorbell buffer support Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3) and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13 in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config command, the nvme_dbbuf_config function tries to associate each existing SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address. Queues created after the Doorbell Buffer Config command will have the doorbell buffers associated with them when they are initialized. In nvme_process_sq and nvme_post_cqe, proactively check for Shadow Doorbell buffer changes instead of wait for doorbell register changes. This reduces the number of MMIOs. In nvme_process_db(), update the shadow doorbell buffer value with the doorbell register value if it is the admin queue. This is a hack since hosts like Linux NVMe driver and SPDK do not use shadow doorbell buffer for the admin queue. Copying the doorbell register value to the shadow doorbell buffer allows us to support these hosts as well as spec-compliant hosts that use shadow doorbell buffer for the admin queue. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com> * hw/nvme: Add trace events for shadow doorbell buffer When shadow doorbell buffer is enabled, doorbell registers are lazily updated. The actual queue head and tail pointers are stored in Shadow Doorbell buffers. Add trace events for updates on the Shadow Doorbell buffers and EventIdx buffers. Also add trace event for the Doorbell Buffer Config command. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com> * hw/nvme: fix example serial in documentation The serial prop on the controller is actually describing the nvme subsystem serial, which has to be identical for all controllers within the same nvme subsystem. This is enforced since commit a859eb9f8f64 ("hw/nvme: enforce common serial per subsystem"). Fix the documentation, so that people copying the qemu command line example won't get an error on qemu start. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> * hw/nvme: force nvme-ns param 'shared' to false if no nvme-subsys node Since commit 916b0f0b5264 ("hw/nvme: change nvme-ns 'shared' default") the default value of nvme-ns param 'shared' is set to true, regardless if there is a nvme-subsys node or not. On a system without a nvme-subsys node, a namespace will never be able to be attached to more than one controller, so for this configuration, it is counterintuitive for this parameter to be set by default. Force the nvme-ns param 'shared' to false for configurations where there is no nvme-subsys node, as the namespace will never be able to attach to more than one controller anyway. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> * nvme: Fix misleading macro when mixed with ternary operator Using the Parfait source code analyser and issue was found in hw/nvme/ctrl.c where the macros NVME_CAP_SET_CMBS and NVME_CAP_SET_PMRS are called with a ternary operatore in the second parameter, resulting in a potentially unexpected expansion of the form: x ? a: b & FLAG_TEST which will result in a different result to: (x ? a: b) & FLAG_TEST. The macros should wrap each of the parameters in brackets to ensure the correct result on expansion. Signed-off-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> * hw/nvme: Use ioeventfd to handle doorbell updates Add property "ioeventfd" which is enabled by default. When this is enabled, updates on the doorbell registers will cause KVM to signal an event to the QEMU main loop to handle the doorbell updates. Therefore, instead of letting the vcpu thread run both guest VM and IO emulation, we now use the main loop thread to do IO emulation and thus the vcpu thread has more cycles for the guest VM. Since ioeventfd does not tell us the exact value that is written, it is only useful when shadow doorbell buffer is enabled, where we check for the value in the shadow doorbell buffer when we get the doorbell update event. IOPS comparison on Linux 5.19-rc2: (Unit: KIOPS) qd 1 4 16 64 qemu 35 121 176 153 ioeventfd 41 133 258 313 Changes since v3: - Do not deregister ioeventfd when it was not enabled on a SQ/CQ Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> * MAINTAINERS: Add myself as Guest Agent co-maintainer Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com> Acked-by: Michael Roth <michael.roth@amd.com> * hw/intc/armv7m_nvic: ICPRn must not unpend an IRQ that is being held high In the M-profile Arm ARM, rule R_CVJS defines when an interrupt should be set to the Pending state: A) when the input line is high and the interrupt is not Active B) when the input line transitions from low to high and the interrupt is Active (Note that the first of these is an ongoing condition, and the second is a point-in-time event.) This can be rephrased as: 1 when the line goes from low to high, set Pending 2 when Active goes from 1 to 0, if line is high then set Pending 3 ignore attempts to clear Pending when the line is high and Active is 0 where 1 covers both B and one of the "transition into condition A" cases, 2 deals with the other "transition into condition A" possibility, and 3 is "don't drop Pending if we're already in condition A". Transitions out of condition A don't affect Pending state. We handle case 1 in set_irq_level(). For an interrupt (as opposed to other kinds of exception) the only place where we clear Active is in armv7m_nvic_complete_irq(), where we handle case 2 by checking for whether we need to re-pend the exception. For case 3, the only places where we clear Pending state on an interrupt are in armv7m_nvic_acknowledge_irq() (where we are setting Active so it doesn't count) and for writes to NVIC_ICPRn. It is the "write to NVIC_ICPRn" case that we missed: we must ignore this if the input line is high and the interrupt is not Active. (This required behaviour is differently and perhaps more clearly stated in the v7M Arm ARM, which has pseudocode in section B3.4.1 that implies it.) Reported-by: Igor Kotrasiński <i.kotrasinsk@samsung.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220628154724.3297442-1-peter.maydell@linaro.org * target/arm: Fill in VL for tbflags when SME enabled and SVE disabled When PSTATE.SM, VL = SVL even if SVE is disabled. This is visible in kselftest ssve-test. Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220713045848.217364-2-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> * target/arm: Fix aarch64_sve_change_el for SME We were only checking for SVE disabled and not taking into account PSTATE.SM to check SME disabled, which resulted in vectors being incorrectly truncated. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220713045848.217364-3-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> * linux-user/aarch64: Do not clear PROT_MTE on mprotect The documentation for PROT_MTE says that it cannot be cleared by mprotect. Further, the implementation of the VM_ARCH_CLEAR bit, contains PROT_BTI confiming that bit should be cleared. Introduce PAGE_TARGET_STICKY to allow target/arch/cpu.h to control which bits may be reset during page_set_flags. This is sort of the opposite of VM_ARCH_CLEAR, but works better with qemu's PAGE_* bits that are separate from PROT_* bits. Reported-by: Vitaly Buka <vitalybuka@google.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220711031420.17820-1-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> * target/arm: Define and use new regime_tcr_value() function The regime_tcr() function returns a pointer to a struct TCR corresponding to the TCR controlling a translation regime. The struct TCR has the raw value of the register, plus two fields mask and base_mask which are used as a small optimization in the case of 32-bit short-descriptor lookups. Almost all callers of regime_tcr() only want the raw register value. Define and use a new regime_tcr_value() function which returns only the raw 64-bit register value. This is a preliminary to removing the 32-bit short descriptor optimization -- it only saves a handful of bit operations, which is tiny compared to the overhead of doing a page table walk at all, and the TCR struct is awkward and makes fixing https://gitlab.com/qemu-project/qemu/-/issues/1103 unnecessarily difficult. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220714132303.1287193-2-peter.maydell@linaro.org * target/arm: Calculate mask/base_mask in get_level1_table_address() In get_level1_table_address(), instead of using precalculated values of mask and base_mask from the TCR struct, calculate them directly (in the same way we currently do in vmsa_ttbcr_raw_write() to populate the TCR struct fields). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220714132303.1287193-3-peter.maydell@linaro.org * target/arm: Fold regime_tcr() and regime_tcr_value() together The only caller of regime_tcr() is now regime_tcr_value(); fold the two together, and use the shorter and more natural 'regime_tcr' name for the new function. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220714132303.1287193-4-peter.maydell@linaro.org * target/arm: Fix big-endian host handling of VTCR We have a bug in our handling of accesses to the AArch32 VTCR register on big-endian hosts: we were not adjusting the part of the uint64_t field within TCR that the generated code would access. That can be done with offsetoflow32(), by using an ARM_CP_STATE_BOTH cpreg struct, or by defining a full set of read/write/reset functions -- the various other TCR cpreg structs used one or another of those strategies, but for VTCR we did not, so on a big-endian host VTCR accesses would touch the wrong half of the register. Use offsetoflow32() in the VTCR register struct. This works even though the field in the CPU struct is currently a struct TCR, because the first field in that struct is the uint64_t raw_tcr. None of the other TCR registers have this bug -- either they are AArch64 only, or else they define resetfn, writefn, etc, and expect to be passed the full struct pointer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220714132303.1287193-5-peter.maydell@linaro.org * target/arm: Store VTCR_EL2, VSTCR_EL2 registers as uint64_t Change the representation of the VSTCR_EL2 and VTCR_EL2 registers in the CPU state struct from struct TCR to uint64_t. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220714132303.1287193-6-peter.maydell@linaro.org * target/arm: Store TCR_EL* registers as uint64_t Change the representation of the TCR_EL* registers in the CPU state struct from struct TCR to uint64_t. This allows us to drop the custom vmsa_ttbcr_raw_write() function, moving the "enforce RES0" checks to their more usual location in the writefn vmsa_ttbcr_write(). We also don't need the resetfn any more. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220714132303.1287193-7-peter.maydell@linaro.org * target/arm: Honour VTCR_EL2 bits in Secure EL2 In regime_tcr() we return the appropriate TCR register for the translation regime. For Secure EL2, we return the VSTCR_EL2 value, but in this translation regime some fields that control behaviour are in VTCR_EL2. When this code was originally written (as the comment notes), QEMU didn't care about any of those fields, but we have since added support for features such as LPA2 which do need the values from those fields. Synthesize a TCR value by merging in the relevant VTCR_EL2 fields to the VSTCR_EL2 value. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1103 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220714132303.1287193-8-peter.maydell@linaro.org * hw/adc: Fix CONV bit in NPCM7XX ADC CON register The correct bit for the CONV bit in NPCM7XX ADC is bit 13. This patch fixes that in the module, and also lower the IRQ when the guest is done handling an interrupt event from the ADC module. Signed-off-by: Hao Wu <wuhaotsh@google.com> Reviewed-by: Patrick Venture<venture@google.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20220714182836.89602-4-wuhaotsh@google.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> * hw/adc: Make adci[*] R/W in NPCM7XX ADC Our sensor test requires both reading and writing from a sensor's QOM property. So we need to make the input of ADC module R/W instead of write only for that to work. Signed-off-by: Hao Wu <wuhaotsh@google.com> Reviewed-by: Titus Rwantare <titusr@google.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20220714182836.89602-5-wuhaotsh@google.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> * target/arm: Don't set syndrome ISS for loads and stores with writeback The architecture requires that for faults on loads and stores which do writeback, the syndrome information does not have the ISS instruction syndrome information (i.e. ISV is 0). We got this wrong for the load and store instructions covered by disas_ldst_reg_imm9(). Calculate iss_valid correctly so that if the insn is a writeback one it is false. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1057 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220715123323.1550983-1-peter.maydell@linaro.org * Align Raspberry Pi DMA interrupts with Linux DTS There is nothing in the specs on DMA engine interrupt lines: it should have been in the "BCM2835 ARM Peripherals" datasheet but the appropriate "ARM peripherals interrupt table" (p.113) is nearly empty. All Raspberry Pi models 1-3 (based on bcm2835) have Linux device tree (arch/arm/boot/dts/bcm2835-common.dtsi +25): /* dma channel 11-14 share one irq */ This information is repeated in the driver code (drivers/dma/bcm2835-dma.c +1344): /* * in case of channel >= 11 * use the 11th interrupt and that is shared */ In this patch channels 0--10 and 11--14 are handled separately. Signed-off-by: Andrey Makarov <andrey.makarov@auriga.com> Message-id: 20220716113210.349153-1-andrey.makarov@auriga.com [PMM: fixed checkpatch nits] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> * monitor: add support for boolean statistics The next version of Linux will introduce boolean statistics, which can only have 0 or 1 values. Support them in the schema and in the HMP command. Suggested-by: Amneesh Singh <natto@weirdnatto.in> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * kvm: add support for boolean statistics The next version of Linux will introduce boolean statistics, which can only have 0 or 1 values. Convert them to the new QAPI fields added in the previous commit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * ppc64: Allocate IRQ lines with qdev_init_gpio_in() This replaces the IRQ array 'irq_inputs' with GPIO lines, the goal being to remove 'irq_inputs' when all CPUs have been converted. Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220705145814.461723-2-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * ppc/40x: Allocate IRQ lines with qdev_init_gpio_in() Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220705145814.461723-3-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * ppc/6xx: Allocate IRQ lines with qdev_init_gpio_in() Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220705145814.461723-4-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * ppc/e500: Allocate IRQ lines with qdev_init_gpio_in() Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220705145814.461723-5-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * ppc: Remove unused irq_inputs Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220705145814.461723-6-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * hw/ppc: pass random seed to fdt If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to initialize early. Set this using the usual guest random number generation function. This is confirmed to successfully initialize the RNG on Linux 5.19-rc6. The rng-seed node is part of the DT spec. Set this on the paravirt platforms, spapr and e500, just as is done on other architectures with paravirt hardware. Cc: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220712135114.289855-1-Jason@zx2c4.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc/kvm: Skip current and parent directories in kvmppc_find_cpu_dt Some systems have /proc/device-tree/cpus/../clock-frequency. However, this is not the expected path for a CPU device tree directory. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220712210810.35514-1-muriloo@linux.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Fix gen_priv_exception error value in mfspr/mtspr The code in linux-user/ppc/cpu_loop.c expects POWERPC_EXCP_PRIV exception with error POWERPC_EXCP_PRIV_OPC or POWERPC_EXCP_PRIV_REG, while POWERPC_EXCP_INVAL_SPR is expected in POWERPC_EXCP_INVAL exceptions. This mismatch caused an EXCP_DUMP with the message "Unknown privilege violation (03)", as seen in [1]. [1] https://gitlab.com/qemu-project/qemu/-/issues/588 Fixes: 9b2fadda3e01 ("ppc: Rework generation of priv and inval interrupts") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/588 Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Message-Id: <20220627141104.669152-2-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: fix exception error value in slbfee Testing on a POWER9 DD2.3, we observed that the Linux kernel delivers a signal with si_code ILL_PRVOPC (5) when a userspace application tries to use slbfee. To obtain this behavior on linux-user, we should use POWERPC_EXCP_PRIV with POWERPC_EXCP_PRIV_OPC. No functional change is intended for softmmu targets as gen_hvpriv_exception uses the same 'exception' argument (POWERPC_EXCP_HV_EMU) for raise_exception_*, and the powerpc_excp_* methods do not use lower bits of the exception error code when handling POWERPC_EXCP_{INVAL,PRIV}. Reported-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220627141104.669152-3-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: remove mfdcrux and mtdcrux The only PowerPC implementations with these insns were the 460 and 460F, which had their definitions removed in [1]. [1] 7ff26aa6c657 ("target/ppc: Remove unused PPC 460 and 460F definitions") Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20220627141104.669152-4-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: fix exception error code in helper_{load, store}_dcr POWERPC_EXCP_INVAL should only be or-ed with other constants prefixed with POWERPC_EXCP_INVAL_. Also, take the opportunity to move both helpers under #if !defined(CONFIG_USER_ONLY) as the instructions that use them are privileged. No functional change is intended, the lower 4 bits of the error code are ignored by all powerpc_excp_* methods on POWERPC_EXCP_INVAL exceptions. Reported-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220627141104.669152-5-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: fix PMU Group A register read/write exceptions A call to "gen_(hv)priv_exception" should use POWERPC_EXCP_PRIV_* as the 'error' argument instead of POWERPC_EXCP_INVAL_*, and POWERPC_EXCP_FU is an exception type, not an exception error code. To correctly set FSCR[IC], we should raise Facility Unavailable with this exception type and IC value as the error code. Fixes: 565cb1096733 ("target/ppc: add user read/write functions for MMCR0") Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220627141104.669152-6-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: fix exception error code in spr_write_excp_vector The 'error' argument of gen_inval_exception will be or-ed with POWERPC_EXCP_INVAL, so it should always be a constant prefixed with POWERPC_EXCP_INVAL_. No functional change is intended, spr_write_excp_vector is only used by register_BookE_sprs, and powerpc_excp_booke ignores the lower 4 bits of the error code on POWERPC_EXCP_INVAL exceptions. Also, take the opportunity to replace printf with qemu_log_mask. Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220627141104.669152-7-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move tlbie[l] to decode tree Also decode RIC, PRS and R operands. Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220712193741.59134-2-leandro.lupori@eldorado.org.br> [danielhb: mark bit 31 in @X_tlbie pattern as ignored] Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Implement ISA 3.00 tlbie[l] This initial version supports the invalidation of one or all TLB entries. Flush by PID/LPID, or based in process/partition scope is not supported, because it would make using the generic QEMU TLB implementation hard. In these cases, all entries are flushed. Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220712193741.59134-3-leandro.lupori@eldorado.org.br> [danielhb: moved 'set' declaration to TLBIE_RIC_PWC block] Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: receive DisasContext explicitly in GEN_PRIV GEN_PRIV and related CHK_* macros just assumed that variable named "ctx" would be in scope when they are used, and that it would be a pointer to DisasContext. Change these macros to receive the pointer explicitly. Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-2-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: add macros to check privilege level Equivalent to CHK_SV and CHK_HV, but can be used in decodetree methods. Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-3-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move slbie to decodetree Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-4-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move slbieg to decodetree Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-5-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move slbia to decodetree Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-6-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move slbmte to decodetree Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-7-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move slbmfev to decodetree Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-8-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move slbmfee to decodetree Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-9-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move slbfee to decodetree Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-10-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Move slbsync to decodetree Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-11-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Implement slbiag Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-12-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: check tb_env != 0 before printing TBU/TBL/DECR When using "-machine none", env->tb_env is not allocated, causing the segmentation fault reported in issue #85 (launchpad bug #811683). To avoid this problem, check if the pointer != NULL before calling the methods to print TBU/TBL/DECR. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/85 Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220714172343.80539-1-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * ppc: Check partition and process table alignment Check if partition and process tables are properly aligned, in their size, according to PowerISA 3.1B, Book III 6.7.6 programming note. Hardware and KVM also raise an exception in these cases. Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20220628133959.15131-2-leandro.lupori@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Improve Radix xlate level validation Check if the number and size of Radix levels are valid on POWER9/POWER10 CPUs, according to the supported Radix Tree Configurations described in their User Manuals. Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20220628133959.15131-3-leandro.lupori@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * target/ppc: Check page dir/table base alignment According to PowerISA 3.1B, Book III 6.7.6 programming note, the page directory base addresses are expected to be aligned to their size. Real hardware seems to rely on that and will access the wrong address if they are misaligned. This results in a translation failure even if the page tables seem to be properly populated. Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220628133959.15131-4-leandro.lupori@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> * qga: treat get-guest-fsinfo as "best effort" In some container environments, there may be references to block devices witnessable from a container through /proc/self/mountinfo that reference devices we simply don't have access to in the container, and cannot provide information about. Instead of failing the entire fsinfo command, return stub information for these failed lookups. This allows test-qga to pass under docker tests, which are in turn used by the CentOS VM tests. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220708153503.18864-2-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/vm: use 'cp' instead of 'ln' for temporary vm images If the initial setup fails, you've permanently altered the state of the downloaded image in an unknowable way. Use 'cp' like our other test setup scripts do. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220708153503.18864-3-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/vm: switch CentOS 8 to CentOS 8 Stream The old CentOS image didn't work anymore because it was already EOL at the beginning of 2022. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220708153503.18864-4-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/vm: switch centos.aarch64 to CentOS 8 Stream Switch this test over to using a cloud image like the base CentOS8 VM test, which helps make this script a bit simpler too. Note: At time of writing, this test seems pretty flaky when run without KVM support for aarch64. Certain unit tests like migration-test, virtio-net-failover, test-hmp and qom-test seem quite prone to fail under TCG. Still, this is an improvement in that at least pure build tests are functional. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220708153503.18864-5-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/vm: upgrade Ubuntu 18.04 VM to 20.04 18.04 has fallen out of our support window, so move ubuntu.aarch64 forward to ubuntu 20.04, which is now our oldest supported Ubuntu release. Notes: This checksum changes periodically; use a fixed point image with a known checksum so that the image isn't re-downloaded on every single invocation. (The checksum for the 18.04 image was already incorrect at the time of writing.) Just like the centos.aarch64 test, this test currently seems very flaky when run as a TCG test. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220708153503.18864-6-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/vm: remove ubuntu.i386 VM test Ubuntu 18.04 is out of our support window, and Ubuntu 20.04 does not support i386 anymore. The debian project does, but they do not provide any cloud images for it, a new expect-style script would have to be written. Since we have i386 cross-compiler tests hosted on GitLab CI, we don't need to support this VM test anymore. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220708153503.18864-7-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/vm: remove duplicate 'centos' VM test This is listed twice by accident; we require genisoimage to run the test, so remove the unconditional entry. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220708153503.18864-8-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/vm: add 1GB extra memory per core If you try to run a 16 or 32 threaded test, you're going to run out of memory very quickly with qom-test and a few others. Bump the memory limit to try to scale with larger-core machines. Granted, this means that a 16 core processor is going to ask for 16GB, but you *probably* meet that requirement if you have such a machine. 512MB per core didn't seem to be enough to avoid ENOMEM and SIGABRTs in the test cases in practice on a six core machine; so I bumped it up to 1GB which seemed to help. Add this magic in early to the configuration process so that the config file, if provided, can still override it. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220708153503.18864-9-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/vm: Remove docker cross-compile test from CentOS VM The fedora container has since been split apart, so there's no suitable nearby target that would support "test-mingw" as it requires both x32 and x64 support -- so either fedora-cross-win32 nor fedora-cross-win64 would be truly suitable. Just remove this test as superfluous with our current CI infrastructure. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220708153503.18864-10-jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * qtest/machine-none: Add LoongArch support Update the cpu_maps[] to support the LoongArch target. Signed-off-by: Song Gao <gaosong@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220713020258.601424-1-gaosong@loongson.cn> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/unit: Replace g_memdup() by g_memdup2() Per https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538 The old API took the size of the memory to duplicate as a guint, whereas most memory functions take memory sizes as a gsize. This made it easy to accidentally pass a gsize to g_memdup(). For large values, that would lead to a silent truncation of the size from 64 to 32 bits, and result in a heap area being returned which is significantly smaller than what the caller expects. This can likely be exploited in various modules to cause a heap buffer overflow. Replace g_memdup() by the safer g_memdup2() wrapper. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210903174510.751630-24-philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * Replace 'whitelist' with 'allow' Let's use more inclusive language here and avoid terms that are frowned upon nowadays. Message-Id: <20220711095300.60462-1-thuth@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com> * util: Fix broken build on Haiku A recent commit moved some Haiku-specific code parts from oslib-posix.c to cutils.c, but failed to move the corresponding header #include statement, too, so "make vm-build-haiku.x86_64" is currently broken. Fix it by moving the header #include, too. Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils") Message-Id: <20220718172026.139004-1-thuth@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * python/qemu/qmp/legacy: Replace 'returns-whitelist' with the correct type 'returns-whitelist' has been renamed to 'command-returns-exceptions' in commit b86df3747848 ("qapi: Rename pragma *-whitelist to *-exceptions"). Message-Id: <20220711095721.61280-1-thuth@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * pl050: move PL050State from pl050.c to new pl050.h header file This allows the QOM types in pl050.c to be used elsewhere by simply including pl050.h. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-2-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: rename pl050_keyboard_init() to pl050_kbd_init() This is for consistency with all of the other devices that use the PS2 keyboard device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-3-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: change PL050State dev pointer from void to PS2State This allows the compiler to enforce that the PS2 device pointer is always of type PS2State. Update the name of the pointer from dev to ps2dev to emphasise this type change. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-4-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: introduce new PL050_KBD_DEVICE QOM type This will be soon be used to hold the underlying PS2_KBD_DEVICE object. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-5-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: introduce new PL050_MOUSE_DEVICE QOM type This will be soon be used to hold the underlying PS2_MOUSE_DEVICE object. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-6-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: move logic from pl050_realize() to pl050_init() The logic for initialising the register memory region and the sysbus output IRQ does not depend upon any device properties and so can be moved from pl050_realize() to pl050_init(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-7-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: introduce PL050DeviceClass for the PL050 device This will soon be used to store the reference to the PL050 parent device for PL050_KBD_DEVICE and PL050_MOUSE_DEVICE. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-8-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: introduce pl050_kbd_class_init() and pl050_kbd_realize() Introduce a new pl050_kbd_class_init() function containing a call to device_class_set_parent_realize() which calls a new pl050_kbd_realize() function to initialise the PS2 keyboard device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-9-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: introduce pl050_mouse_class_init() and pl050_mouse_realize() Introduce a new pl050_mouse_class_init() function containing a call to device_class_set_parent_realize() which calls a new pl050_mouse_realize() function to initialise the PS2 mouse device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-10-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: don't use legacy ps2_kbd_init() function Instantiate the PS2 keyboard device within PL050KbdState using object_initialize_child() in pl050_kbd_init() and realize it in pl050_kbd_realize() accordingly. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-11-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pl050: don't use legacy ps2_mouse_init() function Instantiate the PS2 mouse device within PL050MouseState using object_initialize_child() in pl050_mouse_init() and realize it in pl050_mouse_realize() accordingly. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-12-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: don't use vmstate_register() in lasips2_realize() Since lasips2 is a qdev device then vmstate_ps2_mouse can be registered using the DeviceClass vmsd field instead. Note that due to the use of the base parameter in the original vmstate_register() function call, this is actually a migration break for the HPPA B160L machine. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-13-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: remove the qdev base property and the lasips2_properties array The base property was only needed for use by vmstate_register() in order to preserve migration compatibility. Now that the lasips2 migration state is registered through the DeviceClass vmsd field, the base property and also the lasips2_properties array can be removed completely as they are no longer required. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-14-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: remove legacy lasips2_initfn() function There is only one user of the legacy lasips2_initfn() function which is in machine_hppa_init(), so inline its functionality into machine_hppa_init() and then remove it. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-15-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: change LASIPS2State dev pointer from void to PS2State This allows the compiler to enforce that the PS2 device pointer is always of type PS2State. Update the name of the pointer from dev to ps2dev to emphasise this type change. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-16-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: QOMify LASIPS2Port This becomes an abstract QOM type which will be a parent type for separate keyboard and mouse port types. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-17-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: introduce new LASIPS2_KBD_PORT QOM type This will be soon be used to hold the underlying PS2_KBD_DEVICE object. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-18-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: introduce new LASIPS2_MOUSE_PORT QOM type This will be soon be used to hold the underlying PS2_MOUSE_DEVICE object. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-19-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: move keyboard port initialisation to new lasips2_kbd_port_init() function Move the initialisation of the keyboard port from lasips2_init() to a new lasips2_kbd_port_init() function which will be invoked using object_initialize_child() during the LASIPS2 device init. Update LASIPS2State so that it now holds the new LASIPS2KbdPort child object and ensure that it is realised in lasips2_realize(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-20-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: move mouse port initialisation to new lasips2_mouse_port_init() function Move the initialisation of the mouse port from lasips2_init() to a new lasips2_mouse_port_init() function which will be invoked using object_initialize_child() during the LASIPS2 device init. Update LASIPS2State so that it now holds the new LASIPS2MousePort child object and ensure that it is realised in lasips2_realize(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-21-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: introduce lasips2_kbd_port_class_init() and lasips2_kbd_port_realize() Introduce a new lasips2_kbd_port_class_init() function which uses a new lasips2_kbd_port_realize() function to initialise the PS2 keyboard device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-22-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: introduce lasips2_mouse_port_class_init() and lasips2_mouse_port_realize() Introduce a new lasips2_mouse_port_class_init() function which uses a new lasips2_mouse_port_realize() function to initialise the PS2 mouse device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-23-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: rename LASIPS2Port irq field to birq The existing boolean irq field in LASIPS2Port will soon be replaced by a proper qemu_irq, so rename the field to birq to allow the upcoming qemu_irq to use the irq name. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-24-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: introduce port IRQ and new lasips2_port_init() function Introduce a new lasips2_port_init() QOM init function for the LASIPS2_PORT type and use it to initialise a new gpio for use as a port IRQ. Add a new qemu_irq representing the gpio as a new irq field within LASIPS2Port. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-25-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: introduce LASIPS2PortDeviceClass for the LASIPS2_PORT device This will soon be used to store the reference to the LASIPS2_PORT parent device for LASIPS2_KBD_PORT and LASIPS2_MOUSE_PORT. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-26-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: add named input gpio to port for downstream PS2 device IRQ The named input gpio is to be connected to the IRQ output of the downstream PS2 device and used to drive the port IRQ. Initialise the named input gpio in lasips2_port_init() and add new lasips2_port_class_init() and lasips2_port_realize() functions to connect the PS2 device output gpio to the new named input gpio. Note that the reference to lasips2_port_realize() is stored in LASIPS2PortDeviceClass but not yet used. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-27-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: add named input gpio to handle incoming port IRQs The LASIPS2 device named input gpio is soon to be connected to the port output IRQs. Add a new int_status field to LASIPS2State which is a bitmap representing the port input IRQ status which will be enabled in the next patch. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20220712215251.7944-28-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: switch to using port-based IRQs Now we can implement port-based IRQs by wiring the PS2 device IRQs to the LASI2Port named input gpios rather than directly to the LASIPS2 device, and generate the LASIPS2 output IRQ from the int_status bitmap representing the individual port IRQs instead of the birq boolean. This enables us to remove the separate PS2 keyboard and PS2 mouse named input gpios from the LASIPS2 device and simplify the register implementation to drive the port IRQ using qemu_set_irq() rather than accessing the LASIPS2 device IRQs directly. As a consequence the IRQ level logic in lasips2_set_irq() can also be simplified accordingly. For now this patch ignores adding the int_status bitmap and simply drops the birq boolean from the vmstate_lasips2 VMStateDescription. This is because the migration stream is already missing some required LASIPS2 fields, and as this series already introduces a migration break for the lasips2 device it is easiest to fix this in a follow-up patch. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20220712215251.7944-29-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: rename LASIPS2Port parent pointer to lasips2 This makes it clearer that the pointer is a reference to the LASIPS2 container device rather than an implied part of the QOM hierarchy. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-30-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: standardise on lp name for LASIPS2Port variables This is shorter to type and keeps the naming convention consistent within the LASIPS2 device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-31-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: switch register memory region to DEVICE_BIG_ENDIAN The LASI device (and so also the LASIPS2 device) are only used for the HPPA B160L machine which is a big endian architecture. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-32-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: don't use legacy ps2_kbd_init() function Instantiate the PS2 keyboard device within LASIPS2KbdPort using object_initialize_child() in lasips2_kbd_port_init() and realize it in lasips2_kbd_port_realize() accordingly. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-33-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: don't use legacy ps2_mouse_init() function Instantiate the PS2 mouse device within LASIPS2MousePort using object_initialize_child() in lasips2_mouse_port_init() and realize it in lasips2_mouse_port_realize() accordingly. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-34-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * lasips2: update VMStateDescription for LASIPS2 device Since this series has already introduced a migration break for the HPPA B160L machine, we can use this opportunity to improve the VMStateDescription for the LASIPS2 device. Add the new int_status field to the VMStateDescription and remodel the ports as separate VMSTATE_STRUCT instances representing each LASIPS2Port. Once this is done, the migration stream can be updated to include buf and loopback_rbne for each port (which is necessary since the values are accessed across separate IO accesses), and drop the port id as this is hardcoded for each port type. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20220712215251.7944-35-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pckbd: introduce new vmstate_kbd_mmio VMStateDescription for the I8042_MMIO device This enables us to register the VMStateDescription using the DeviceClass vmsd property rather than having to call vmstate_register() from i8042_mmio_realize(). Note that this is a migration break for the MIPS magnum machine which is the only user of the I8042_MMIO device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-36-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pckbd: don't use legacy ps2_kbd_init() function Instantiate the PS2 keyboard device within KBDState using object_initialize_child() in i8042_initfn() and i8042_mmio_init() and realize it in i8042_realizefn() and i8042_mmio_realize() accordingly. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-37-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * ps2: remove unused legacy ps2_kbd_init() function Now that the legacy ps2_kbd_init() function is no longer used, it can be completely removed along with its associated trace-event. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-38-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pckbd: don't use legacy ps2_mouse_init() function Instantiate the PS2 mouse device within KBDState using object_initialize_child() in i8042_initfn() and i8042_mmio_init() and realize it in i8042_realizefn() and i8042_mmio_realize() accordingly. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-39-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * ps2: remove unused legacy ps2_mouse_init() function Now that the legacy ps2_mouse_init() function is no longer used, it can be completely removed along with its associated trace-event. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-40-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * pckbd: remove legacy i8042_mm_init() function This legacy function is only used during the initialisation of the MIPS magnum machine, so inline its functionality directly into mips_jazz_init() and then remove it. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Helge Deller <deller@gmx.de> Acked-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220712215251.7944-41-mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> * util: Fix broken build on Haiku A recent commit moved some Haiku-specific code parts from oslib-posix.c to cutils.c, but failed to move the corresponding header #include statement, too, so "make vm-build-haiku.x86_64" is currently broken. Fix it by moving the header #include, too. Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220718172026.139004-1-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> * target/s390x: fix handling of zeroes in vfmin/vfmax vfmin_res() / vfmax_res() are trying to check whether a and b are both zeroes, but in reality they check that they are the same kind of zero. This causes incorrect results when comparing positive and negative zeroes. Fixes: da4807527f3b ("s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)") Co-developed-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20220713182612.3780050-2-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * target/s390x: fix NaN propagation rules s390x has the same NaN propagation rules as ARM, and not as x86. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20220713182612.3780050-3-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * tests/tcg/s390x: test signed vfmin/vfmax Add a test to prevent regressions. Try all floating point value sizes and all combinations of floating point value classes. Verify the results against PoP tables, which are represented as close to the original as possible - this produces a lot of checkpatch complaints, but it seems to be justified in this case. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220713182612.3780050-4-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> * dbus-display: fix test race when initializing p2p connection The D-Bus connection starts processing messages before QEMU has the time to set the object manager server. This is causing dbus-display-test to fail randomly with: ERROR:../tests/qtest/dbus-display-test.c:68:test_dbus_display_vm: assertion failed (qemu_dbus_display1_vm_get_name(QEMU_DBUS_DISPLAY1_VM(vm)) == "dbus-test"): (NULL == "dbus-test") ERROR Use the delayed message processing flag and method to avoid that situation. (the bus connection doesn't need a fix, as the initialization is done synchronously) Reported-by: Robinson, Cole <crobinso@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Tested-by: Cole Robinson <crobinso@redhat.com> Message-Id: <20220609152647.870373-1-marcandre.lureau@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> * microvm: turn off io reservations for pcie root ports The pcie host bridge has no io window on microvm, so io reservations will not work. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-Id: <20220701091516.43489-1-kraxel@redhat.com> * usb/hcd-xhci: check slotid in xhci_wakeup_endpoint() This prevents an OOB read (followed by an assertion failure in xhci_kick_ep) when slotid > xhci->numslots. Reported-by: Soul Chen <soulchen8650@gmail.com> Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com> Message-Id: <20220705174734.2348829-1-mcascell@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> * usb: document guest-reset and guest-reset-all Suggested-by: Michal Prívozník <mprivozn@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20220711094437.3995927-2-kraxel@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> * usb: document pcap (aka usb traffic capture) Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-Id: <20220711094437.3995927-3-kraxel@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> * gtk: Add show_tabs=on|off command line option. The patch adds "show_tabs" command line option for GTK ui similar to "grab_on_hover". This option allows tabbed view mode to not have to be enabled by hand at each start of the VM. Signed-off-by: Felix "xq" Queißner <xq@random-projects.net> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220712133753.18937-1-xq@random-projects.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> * tests/docker/dockerfiles: Add debian-loongarch-cross.docker Use the pre-packaged toolchain provided by Loongson via github. Tested-by: Song Gao <gaosong@loongson.cn> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220704070824.965429-1-richard.henderson@linaro.org> * target/loongarch: Fix loongarch_cpu_class_by_name The cpu_model argument may already have the '-loongarch-cpu' suffix, e.g. when using the default for the LS7A1000 machine. If that fails, try again with the suffix. Validate that the object created by the function is derived from the proper base class. Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220715060740.1500628-2-yangxiaojuan@loongson.cn> [rth: Try without and then with the suffix, to avoid testsuite breakage.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * hw/intc/loongarch_pch_pic: Fix bugs for update_irq function Fix such errors: 1. We should not use 'unsigned long' type as argument when we use find_first_bit(), and we use ctz64() to replace find_first_bit() to fix this bug. 2. It is not standard to use '1ULL << irq' to generate a irq mask. So, we replace it with 'MAKE_64BIT_MASK(irq, 1)'. Fix coverity CID: 1489761 1489764 1489765 Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Message-Id: <20220715060740.1500628-3-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * target/loongarch/cpu: Fix coverity errors about excp_names Fix out-of-bounds errors when access excp_names[] array. the valid boundary size of excp_names should be 0 to ARRAY_SIZE(excp_names)-1. However, the general code do not consider the max boundary. Fix coverity CID: 1489758 Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220715060740.1500628-4-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * target/loongarch/tlb_helper: Fix coverity integer overflow error Replace '1 << shift' with 'MAKE_64BIT_MASK(shift, 1)' to fix unintentional integer overflow errors in tlb_helper file. Fix coverity CID: 1489759 1489762 Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220715060740.1500628-5-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * target/loongarch/op_helper: Fix coverity cond_at_most error The boundary size of cpucfg array should be 0 to ARRAY_SIZE(cpucfg)-1. So, using index bigger than max boundary to access cpucfg[] must be forbidden. Fix coverity CID: 1489760 Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220715060740.1500628-6-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * target/loongarch/cpu: Fix cpucfg default value We should config cpucfg[20] to set value for the scache's ways, sets, and size arguments when loongarch cpu init. However, the old code wirte 'sets argument' twice, so we change one of them to 'size argument'. Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220715064829.1521482-1-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * fpu/softfloat: Add LoongArch specializations for pickNaN* The muladd (inf,zero,nan) case sets InvalidOp and returns the input value 'c', and prefer sNaN over qNaN, in c,a,b order. Binary operations prefer sNaN over qNaN and a,b order. Signed-off-by: Song Gao <gaosong@loongson.cn> Message-Id: <20220716085426.3098060-3-gaosong@loongson.cn> [rth: Add specialization for pickNaN] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * target/loongarch: Fix float_convd/float_convs test failing We should result zero when exception is invalid and operation is nan Signed-off-by: Song Gao <gaosong@loongson.cn> Message-Id: <20220716085426.3098060-4-gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * tests/tcg/loongarch64: Add float reference files Generated on Loongson-3A5000 (CPU revision 0x0014c011). Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220104132022.2146857-1-f4bug@amsat.org> Signed-off-by: Song Gao <gaosong@loongson.cn> Message-Id: <20220716085426.3098060-2-gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * tests/tcg/loongarch64: Add clo related instructions test This includes: - CL{O/Z}.{W/D} - CT{O/Z}.{W/D} Signed-off-by: Song Gao <gaosong@loongson.cn> Message-Id: <20220716085426.3098060-5-gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * tests/tcg/loongarch64: Add div and mod related instructions test This includes: - DIV.{W[U]/D[U]} - MOD.{W[U]/D[U]} Signed-off-by: Song Gao <gaosong@loongson.cn> Message-Id: <20220716085426.3098060-6-gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * tests/tcg/loongarch64: Add fclass test This includes: - FCLASS.{S/D} Signed-off-by: Song Gao <gaosong@loongson.cn> Message-Id: <20220716085426.3098060-7-gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * tests/tcg/loongarch64: Add fp comparison instructions test Choose some instructions to test: - FCMP.cond.S - cond: ceq clt cle cne seq slt sle sne Signed-off-by: Song Gao <gaosong@loongson.cn> Message-Id: <20220716085426.3098060-8-gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * tests/tcg/loongarch64: Add pcadd related instructions test This includes: - PCADDI - PCADDU12I - PCADDU18I - PCALAU12I Signed-off-by: Song Gao <gaosong@loongson.cn> Message-Id: <20220716085426.3098060-9-gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * hw/loongarch: Add fw_cfg table support Add fw_cfg table for loongarch virt machine, including memmap table. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Message-Id: <20220712083206.4187715-2-yangxiaojuan@loongson.cn> [rth: Replace fprintf with assert; drop unused return value; initialize reserved slot to zero.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * hw/loongarch: Add uefi bios loading support Add uefi bios loading support, now only uefi bios is porting to loongarch virt machine. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Message-Id: <20220712083206.4187715-3-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * hw/loongarch: Add linux kernel booting support There are two situations to start system by kernel file. If exists bios option, system will boot from loaded bios file, else system will boot from hardcoded auxcode, and jump to kernel elf entry. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Message-Id: <20220712083206.4187715-4-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * hw/loongarch: Add smbios support Add smbios support for loongarch virt machine, and put them into fw_cfg table so that bios can parse them quickly. The weblink of smbios spec: https://www.dmtf.org/dsp/DSP0134, the version is 3.6.0. Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Message-Id: <20220712083206.4187715-5-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * hw/loongarch: Add acpi ged support Loongarch virt machine uses general hardware reduces acpi method, rather than LS7A acpi device. Now only power management function is used in acpi ged device, memory hotplug will be added later. Also acpi tables such as RSDP/RSDT/FADT etc. The acpi table has submited to acpi spec, and will release soon. Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Message-Id: <20220712083206.4187715-6-yangxiaojuan@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * hw/loongarch: Add fdt support Add LoongArch flatted device tree, adding cpu device node, firmware cfg node, pcie node into it, and create fdt rom memory region. Now fdt info is not full since only uefi bios uses fdt, linux kernel does not use fdt. Loongarch Linux kernel uses acpi table which is full in qemu virt machine. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Message-Id: <20220712083206.4187715-7-yangxiaojuan@loongson.cn> [rth: Set TARGET_NEED_FDT, add fdt to meson.build] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> * Hexagon (target/hexagon) fix store w/mem_noshuf & predicated load Call the CHECK_NOSHUF macro multiple times: once in the fGEN_TCG_PRED_LOAD() and again in fLOAD(). Before this commit, a packet with a store and a predicated load with mem_noshuf that gets encoded like this: { P0 = cmp.eq(R17,#0x0) memw(R18+#0x0) = R2 if (!P0.new) R3 = memw(R17+#0x4) } ... would end up generating a branch over both the load and the store like so: ... brcond_i32 loc17,$0x0,eq,$L1 mov_i32 loc18,store_addr_1 qemu_st_i32 store_val32_1,store_addr_1,leul,0 qemu_ld_i32 loc16,loc7,leul,0 set_label $L1 ... Test cases added to tests/tcg/hexagon/mem_noshuf.c Co-authored-by: Taylor Simpson <tsimpson@quicinc.com> Signed-off-by: Brian Cain <bcain@quicinc.com> Signed-off-by: Taylor Simpson <tsimpson@quicinc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220707210546.15985-2-tsimpson@quicinc.com> * Hexagon (target/hexagon) fix bug in mem_noshuf load exception The semantics of a mem_noshuf packet are that the store effectively happens before the load. However, in cases where the load raises an exception, we cannot simply execute the store first. This change adds a probe to check that the load will not raise an exception before executing the store. If the load is predicated, this requires special handling. We check the condition before performing the probe. Since, we need the EA to perform the check, we move the GET_EA portion inside CHECK_NOSHUF_PRED. Test case added in tests/tcg/hexagon/mem_noshuf_exception.c Suggested-by: Alessandro Di Federico <ale@rev.ng> Suggested-by: Anton Johansson <anjo@rev.ng> Signed-off-by: Taylor Simpson <tsimpson@quicinc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220707210546.15985-3-tsimpson@quicinc.com> * vhost: move descriptor translation to vhost_svq_vring_write_descs It's done for both in and out descriptors so it's better placed here. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * virtio-net: Expose MAC_TABLE_ENTRIES vhost-vdpa control virtqueue needs to know the maximum entries supported by the virtio-net device, so we know if it is possible to apply the filter. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * virtio-net: Expose ctrl virtqueue logic This allows external vhost-net devices to modify the state of the VirtIO device model once the vhost-vdpa device has acknowledged the control commands. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vdpa: Avoid compiler to squash reads to used idx In the next patch we will allow busypolling of this value. The compiler have a running path where shadow_used_idx, last_used_idx, and vring used idx are not modified within the same thread busypolling. This was not an issue before since we always cleared device event notifier before checking it, and that could act as memory barrier. However, the busypoll needs something similar to kernel READ_ONCE. Let's add it here, sepparated from the polling. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: Reorder vhost_svq_kick Future code needs to call it from vhost_svq_add. No functional change intended. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: Move vhost_svq_kick call to vhost_svq_add The series needs to expose vhost_svq_add with full functionality, including kick Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: Check for queue full at vhost_svq_add The series need to expose vhost_svq_add with full functionality, including checking for full queue. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: Decouple vhost_svq_add from VirtQueueElement VirtQueueElement comes from the guest, but we're heading SVQ to be able to modify the element presented to the device without the guest's knowledge. To do so, make SVQ accept sg buffers directly, instead of using VirtQueueElement. Add vhost_svq_add_element to maintain element convenience. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: Add SVQDescState This will allow SVQ to add context to the different queue elements. This patch only store the actual element, no functional change intended. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: Track number of descs in SVQDescState A guest's buffer continuos on GPA may need multiple descriptors on qemu's VA, so SVQ should track its length sepparatedly. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: add vhost_svq_push_elem This function allows external SVQ users to return guest's available buffers. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: Expose vhost_svq_add This allows external parts of SVQ to forward custom buffers to the device. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: add vhost_svq_poll It allows the Shadow Control VirtQueue to wait for the device to use the available buffers. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost: Add svq avail_handler callback This allows external handlers to be aware of new buffers that the guest places in the virtqueue. When this callback is defined the ownership of the guest's virtqueue element is transferred to the callback. This means that if the user wants to forward the descriptor it needs to manually inject it. The callback is also free to process the command by itself and use the element with svq_push. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vdpa: Export vhost_vdpa_dma_map and unmap calls Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks from the guest that could set a different state in qemu device model and vdpa device. To do so, it needs to be able to map these new buffers to the device. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vhost-net-vdpa: add stubs for when no virtio-net device is present net/vhost-vdpa.c will need functions that are declared in vhost-shadow-virtqueue.c, that needs functions of virtio-net.c. Copy the vhost-vdpa-stub.c code so only the constructor net_init_vhost_vdpa needs to be defined. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vdpa: manual forward CVQ buffers Do a simple forwarding of CVQ buffers, the same work SVQ could do but through callbacks. No functional change intended. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vdpa: Buffer CVQ support on shadow virtqueue Introduce the control virtqueue support for vDPA shadow virtqueue. This is needed for advanced networking features like rx filtering. Virtio-net control VQ copies the descriptors to qemu's VA, so we avoid TOCTOU with the guest's or device's memory every time there is a device model change. Otherwise, the guest could change the memory content in the time between qemu and the device read it. To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is implemented. If the virtio-net driver changes MAC the virtio-net device model will be updated with the new one, and a rx filtering change event will be raised. More cvq commands could be added here straightforwardly but they have not been tested. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs To know the device features is needed for CVQ SVQ, so SVQ knows if it can handle all commands or not. Extract from vhost_vdpa_get_max_queue_pairs so we can reuse it. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vdpa: Add device migration blocker Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if it uses CVQ. However, qemu is able to migrate simple devices with no CVQ as long as they use SVQ. To allow it, add a placeholder error to vhost_vdpa, and only add to vhost_dev when used. vhost_dev machinery place the migration blocker if needed. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * vdpa: Add x-svq to NetdevVhostVDPAOptions Finally offering the possibility to enable SVQ from the command line. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH If the checkpoint occurs when the guest finishes restarting but has not started running, the runstate_set() may reject the transition from COLO to PRELAUNCH with the crash log: {"timestamp": {"seconds": 1593484591, "microseconds": 26605},\ "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}} qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch' Long-term testing says that it's pretty safe. Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * net/colo: Fix a "double free" crash to clear the conn_list We notice the QEMU may crash when the guest has too many incoming network connections with the following log: 15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it free(): invalid pointer [1] 15195 abort (core dumped) qemu-system-x86_64 .... This is because we create the s->connection_track_table with g_hash_table_new_full() which is defined as: GHashTable * g_hash_table_new_full (GHashFunc hash_func, GEqualFunc key_equal_func, GDestroyNotify key_destroy_func, GDestroyNotify value_destroy_func); The fourth parameter connection_destroy() will be called to free the memory allocated for all 'Connection' values in the hashtable when we call g_hash_table_remove_all() in the connection_hashtable_reset(). But both connection_track_table and conn_list reference to the same conn instance. It will trigger double free in conn_list clear. So this patch remove free action on hash table side to avoid double free the conn. Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * net/colo.c: No need to track conn_list for filter-rewriter Filter-rewriter no need to track connection in conn_list. This patch fix the glib g_queue_is_empty assertion when COLO guest keep a lot of network connection. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * net/colo.c: fix segmentation fault when packet is not parsed correctly When COLO use only one vnet_hdr_support parameter between filter-redirector and filter-mirror(or colo-compare), COLO will crash with segmentation fault. Back track as follow: Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0) at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto); (gdb) bt 0 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0) at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 1 0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at net/colo.c:49 2 0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at net/filter-rewriter.c:63 So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to raise error and add trace-events to track vnet_hdr_len. Signed-off-by: Tao Xu <tao3.xu@intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com> * accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so that it can cover single vcpu dirty-ring-reaping scenario. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <c32001242875e83b0d9f78f396fe2dcd380ba9e8.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * cpus: Introduce cpu_list_generation_id Introduce cpu_list_generation_id to track cpu list generation so that cpu hotplug/unplug can be detected during measurement of dirty page rate. cpu_list_generation_id could be used to detect changes of cpu list, which is prepared for dirty page rate measurement. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <06e1f1362b2501a471dce796abb065b04f320fa5.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration/dirtyrate: Refactor dirty page rate calculation abstract out dirty log change logic into function global_dirty_log_change. abstract out dirty page rate calculation logic via dirty-ring into function vcpu_calculate_dirtyrate. abstract out mathematical dirty page rate calculation into do_calculate_dirtyrate, decouple it from DirtyStat. rename set_sample_page_period to dirty_stat_wait, which is well-understood and will be reused in dirtylimit. handle cpu hotplug/unplug scenario during measurement of dirty page rate. export util functions outside migration. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <7b6f6f4748d5b3d017b31a0429e630229ae97538.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically Introduce the third method GLOBAL_DIRTY_LIMIT of dirty tracking for calculate dirtyrate periodly for dirty page rate limit. Add dirtylimit.c to implement dirtyrate calculation periodly, which will be used for dirty page rate limit. Add dirtylimit.h to export util functions for dirty page rate limit implementation. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <5d0d641bffcb9b1c4cc3e323b6dfecb36050d948.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function Introduce kvm_dirty_ring_size util function to help calculate dirty ring ful time. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <f9ce1f550bfc0e3a1f711e17b1dbc8f701700e56.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * softmmu/dirtylimit: Implement virtual CPU throttle Setup a negative feedback system when vCPU thread handling KVM_EXIT_DIRTY_RING_FULL exit by introducing throttle_us_per_full field in struct CPUState. Sleep throttle_us_per_full microseconds to throttle vCPU if dirtylimit is in service. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * softmmu/dirtylimit: Implement dirty page rate limit Implement dirtyrate calculation periodically basing on dirty-ring and throttle virtual CPU until it reachs the quota dirty page rate given by user. Introduce qmp commands "set-vcpu-dirty-limit", "cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit" to enable, disable, query dirty page limit for virtual CPU. Meanwhile, introduce corresponding hmp commands "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit", "info vcpu_dirty_limit" so the feature can be more usable. "query-vcpu-dirty-limit" success depends on enabling dirty page rate limit, so just add it to the list of skipped command to ensure qmp-cmd-test run successfully. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <4143f26706d413dd29db0b672fe58b3d3fbe34bc.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * tests: Add dirty page rate limit test Add dirty page rate limit test if kernel support dirty ring, The following qmp commands are covered by this test case: "calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit", "cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit". Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <eed5b847a6ef0a9c02a36383dbdd7db367dd1e7e.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * multifd: Copy pages before compressing them with zlib zlib_send_prepare() compresses pages of a running VM. zlib does not make any thread-safety guarantees with respect to changing deflate() input concurrently with deflate() [1]. One can observe problems due to this with the IBM zEnterprise Data Compression accelerator capable zlib [2]. When the hardware acceleration is enabled, migration/multifd/tcp/plain/zlib test fails intermittently [3] due to sliding window corruption. The accelerator's architecture explicitly discourages concurrent accesses [4]: Page 26-57, "Other Conditions": As observed by this CPU, other CPUs, and channel programs, references to the parameter block, first, second, and third operands may be multiple-access references, accesses to these storage locations are not necessarily block-concurrent, and the sequence of these accesses or references is undefined. Mark Adler pointed out that vanilla zlib performs double fetches under certain circumstances as well [5], therefore we need to copy data before passing it to deflate(). [1] https://zlib.net/manual.html [2] https://github.com/madler/zlib/pull/410 [3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html [4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf [5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20220705203559.2960949-1-iii@linux.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Add postcopy-preempt capability Firstly, postcopy already preempts precopy due to the fact that we do unqueue_page() first before looking into dirty bits. However that's not enough, e.g., when there're host huge page enabled, when sending a precopy huge page, a postcopy request needs to wait until the whole huge page that is sending to finish. That could introduce quite some delay, the bigger the huge page is the larger delay it'll bring. This patch adds a new capability to allow postcopy requests to preempt existing precopy page during sending a huge page, so that postcopy requests can be serviced even faster. Meanwhile to send it even faster, bypass the precopy stream by providing a standalone postcopy socket for sending requested pages. Since the new behavior will not be compatible with the old behavior, this will not be the default, it's enabled only when the new capability is set on both src/dst QEMUs. This patch only adds the capability itself, the logic will be added in follow up patches. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185342.26794-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Postcopy preemption preparation on channel creation Create a new socket for postcopy to be prepared to send postcopy requested pages via this specific channel, so as to not get blocked by precopy pages. A new thread is also created on dest qemu to receive data from this new channel based on the ram_load_postcopy() routine. The ram_load_postcopy(POSTCOPY) branch and the thread has not started to function, and that'll be done in follow up patches. Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new thread too to make sure it'll be recycled properly. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185502.27149-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: With Peter's fix to quieten compiler warning on start_migration * migration: Postcopy preemption enablement This patch enables postcopy-preempt feature. It contains two major changes to the migration logic: (1) Postcopy requests are now sent via a different socket from precopy background migration stream, so as to be isolated from very high page request delays. (2) For huge page enabled hosts: when there's postcopy requests, they can now intercept a partial sending of huge host pages on src QEMU. After this patch, we'll live migrate a VM with two channels for postcopy: (1) PRECOPY channel, which is the default channel that transfers background pages; and (2) POSTCOPY channel, which only transfers requested pages. There's no strict rule of which channel to use, e.g., if a requested page is already being transferred on precopy channel, then we will keep using the same precopy channel to transfer the page even if it's explicitly requested. In 99% of the cases we'll prioritize the channels so we send requested page via the postcopy channel as long as possible. On the source QEMU, when we found a postcopy request, we'll interrupt the PRECOPY channel sending process and quickly switch to the POSTCOPY channel. After we serviced all the high priority postcopy pages, we'll switch back to PRECOPY channel so that we'll continue to send the interrupted huge page again. There's no new thread introduced on src QEMU. On the destination QEMU, one new thread is introduced to receive page data from the postcopy specific socket (done in the preparation patch). This patch has a side effect: after sending postcopy pages, previously we'll assume the guest will access follow up pages so we'll keep sending from there. Now it's changed. Instead of going on with a postcopy requested page, we'll go back and continue sending the precopy huge page (which can be intercepted by a postcopy request so the huge page can be sent partially before). Whether that's a problem is debatable, because "assuming the guest will continue to access the next page" may not really suite when huge pages are used, especially if the huge page is large (e.g. 1GB pages). So that locality hint is much meaningless if huge pages are used. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185504.27203-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Postcopy recover with preempt enabled To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread needs similar handling on fault tolerance. When ram_load_postcopy() fails, instead of stopping the thread it halts with a semaphore, preparing to be kicked again when recovery is detected. A mutex is introduced to make sure there's no concurrent operation upon the socket. To make it simple, the fast ram load thread will take the mutex during its whole procedure, and only release it if it's paused. The fast-path socket will be properly released by the main loading thread safely when there's network failures during postcopy with that mutex held. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185506.27257-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Create the postcopy preempt channel asynchronously This patch allows the postcopy preempt channel to be created asynchronously. The benefit is that when the connection is slow, we won't take the BQL (and potentially block all things like QMP) for a long time without releasing. A function postcopy_preempt_wait_channel() is introduced, allowing the migration thread to be able to wait on the channel creation. The channel is always created by the main thread, in which we'll kick a new semaphore to tell the migration thread that the channel has created. We'll need to wait for the new channel in two places: (1) when there's a new postcopy migration that is starting, or (2) when there's a postcopy migration to resume. For the start of migration, we don't need to wait for this channel until when we want to start postcopy, aka, postcopy_start(). We'll fail the migration if we found that the channel creation failed (which should probably not happen at all in 99% of the cases, because the main channel is using the same network topology). For a postcopy recovery, we'll need to wait in postcopy_pause(). In that case if the channel creation failed, we can't fail the migration or we'll crash the VM, instead we keep in PAUSED state, waiting for yet another recovery. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185509.27311-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Add property x-postcopy-preempt-break-huge Add a property field that can conditionally disable the "break sending huge page" behavior in postcopy preemption. By default it's enabled. It should only be used for debugging purposes, and we should never remove the "x-" prefix. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185511.27366-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Add helpers to detect TLS capability Add migrate_channel_requires_tls() to detect whether the specific channel requires TLS, leveraging the recently introduced migrate_use_tls(). No functional change intended. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185513.27421-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Export tls-[creds|hostname|authz] params to cmdline too It's useful for specifying tls credentials all in the cmdline (along with the -object tls-creds-*), especially for debugging purpose. The trick here is we must remember to not free these fields again in the finalize() function of migration object, otherwise it'll cause double-free. The thing is when destroying an object, we'll first destroy the properties that bound to the object, then the object itself. To be explicit, when destroy the object in object_finalize() we have such sequence of operations: object_property_del_all(obj); object_deinit(obj, ti); So after this change the two fields are properly released already even before reaching the finalize() function but in object_property_del_all(), hence we don't need to free them anymore in finalize() or it's double-free. This also fixes a trivial memory leak for tls-authz as we forgot to free it before this patch. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185515.27475-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Enable TLS for preempt channel This patch is based on the async preempt channel creation. It continues wiring up the new channel with TLS handshake to destionation when enabled. Note that only the src QEMU needs such operation; the dest QEMU does not need any change for TLS support due to the fact that all channels are established synchronously there, so all the TLS magic is already properly handled by migration_tls_channel_process_incoming(). Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185518.27529-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration: Respect postcopy request order in preemption mode With preemption mode on, when we see a postcopy request that was requesting for exactly the page that we have preempted before (so we've partially sent the page already via PRECOPY channel and it got preempted by another postcopy request), currently we drop the request so that after all the other postcopy requests are serviced then we'll go back to precopy stream and start to handle that. We dropped the request because we can't send it via postcopy channel since the precopy channel already contains partial of the data, and we can only send a huge page via one channel as a whole. We can't split a huge page into two channels. That's a very corner case and that works, but there's a change on the order of postcopy requests that we handle since we're postponing this (unlucky) postcopy request to be later than the other queued postcopy requests. The problem is there's a possibility that when the guest was very busy, the postcopy queue can be always non-empty, it means this dropped request will never be handled until the end of postcopy migration. So, there's a chance that there's one dest QEMU vcpu thread waiting for a page fault for an extremely long time just because it's unluckily accessing the specific page that was preempted before. The worst case time it needs can be as long as the whole postcopy migration procedure. It's extremely unlikely to happen, but when it happens it's not good. The root cause of this problem is because we treat pss->postcopy_requested variable as with two meanings bound together, as the variable shows: 1. Whether this page request is urgent, and, 2. Which channel we should use for this page request. With the old code, when we set postcopy_requested it means either both (1) and (2) are true, or both (1) and (2) are false. We can never have (1) and (2) to have different values. However it doesn't necessarily need to be like that. It's very legal that there's one request that has (1) very high urgency, but (2) we'd like to use the precopy channel. Just like the corner case we were discussing above. To differenciate the two meanings better, introduce a new field called postcopy_target_channel, showing which channel we should use for this page request, so as to cover the old meaning (2) only. Then we leave the postcopy_requested variable to stand only for meaning (1), which is the urgency of this page request. With this change, we can easily boost priority of a preempted precopy page as long as we know that page is also requested as a postcopy page. So with the new approach in get_queued_page() instead of dropping that request, we send it right away with the precopy channel so we get back the ordering of the page faults just like how they're requested on dest. Reported-by: Manish Mishra <manish.mishra@nutanix.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185520.27583-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * tests: Move MigrateCommon upper So that it can be used in postcopy tests too soon. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185522.27638-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * tests: Add postcopy tls migration test We just added TLS tests for precopy but not postcopy. Add the corresponding test for vanilla postcopy. Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests will only use unix sockets as channel. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185525.27692-1-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Manual merge * tests: Add postcopy tls recovery migration test It's easy to build this upon the postcopy tls test. Rename the old postcopy recovery test to postcopy/recovery/plain. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185527.27747-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Manual merge * tests: Add postcopy preempt tests Four tests are added for preempt mode: - Postcopy plain - Postcopy recovery - Postcopy tls - Postcopy tls+recovery Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185530.27801-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Manual merge * migration: remove unreachable code after reading data The code calls qio_channel_read() in a loop when it reports QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN. As such the later block of code will always hit the 'errno != EAGAIN' condition, making the final 'else' unreachable. Fixes: Coverity CID 1490203 Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220627135318.156121-1-berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently returns 1. This return code should be used only when Linux fails to use MSG_ZEROCOPY on a lot of sendmsg(). Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY) was attempted. Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX") Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20220711211112.18951-2-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * Add dirty-sync-missed-zero-copy migration stat Signed-off-by: Leonardo Bras <leobras@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220711211112.18951-3-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * migration/multifd: Report to user when zerocopy not working Some errors, like the lack of Scatter-Gather support by the network interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using zero-copy, which causes it to fall back to the default copying mechanism. After each full dirty-bitmap scan there should be a zero-copy flush happening, which checks for errors each of the previous calls to sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then increment dirty_sync_missed_zero_copy migration stat to let the user know about it. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20220711211112.18951-4-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * multifd: Document the locking of MultiFD{Send/Recv}Params Reorder the structures so we can know if the fields are: - Read only - Their own locking (i.e. sems) - Protected by 'mutex' - Only for the multifd channel Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20220531104318.7494-2-quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Typo fixes from Chen Zhang * migration: Avoid false-positive on non-supported scenarios for zero-copy-send Migration with zero-copy-send currently has it's limitations, as it can't be used with TLS nor any kind of compression. In such scenarios, it should output errors during parameter / capability setting. But currently there are some ways of setting this not-supported scenarios without printing the error message: !) For 'compression' capability, it works by enabling it together with zero-copy-send. This happens because the validity test for zero-copy uses the helper unction migrate_use_compression(), which check for compression presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS]. The point here is: the validity test happens before the capability gets enabled. If all of them get enabled together, this test will not return error. In order to fix that, replace migrate_use_compression() by directly testing the cap_list parameter migrate_caps_check(). 2) For features enabled by parameters such as TLS & 'multifd_compression', there was also a possibility of setting non-supported scenarios: setting zero-copy-send first, then setting the unsupported parameter. In order to fix that, also add a check for parameters conflicting with zero-copy-send on migrate_params_check(). 3) XBZRLE is also a compression capability, so it makes sense to also add it to the list of capabilities which are not supported with zero-copy-send. Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration parameter to migration capability") Signed-off-by: Leonardo Bras <leobras@redhat.com> Message-Id: <20220719122345.253713-1-leobras@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> * Revert "gitlab: disable accelerated zlib for s390x" This reverts commit 309df6acb29346f89e1ee542b1986f60cab12b87. With Ilya's 'multifd: Copy pages before compressing them with zlib' in the latest migration series, this shouldn't be a problem any more. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> * slow snapshots api Co-authored-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Co-authored-by: Paolo Bonzini <pbonzini@redhat.com> Co-authored-by: Peter Maydell <peter.maydell@linaro.org> Co-authored-by: Joel Stanley <joel@jms.id.au> Co-authored-by: Peter Delevoryas <pdel@fb.com> Co-authored-by: Peter Delevoryas <peter@pjd.dev> Co-authored-by: Cédric Le Goater <clg@kaod.org> Co-authored-by: Iris Chen <irischenlj@fb.com> Co-authored-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Co-authored-by: Niklas Cassel <niklas.cassel@wdc.com> Co-authored-by: Darren Kenny <darren.kenny@oracle.com> Co-authored-by: Konstantin Kostiuk <kkostiuk@redhat.com> Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Co-authored-by: Hao Wu <wuhaotsh@google.com> Co-authored-by: Andrey Makarov <ph.makarov@gmail.com> Co-authored-by: Jason A. Donenfeld <Jason@zx2c4.com> Co-authored-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Co-authored-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Co-authored-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Co-authored-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Co-authored-by: John Snow <jsnow@redhat.com> Co-authored-by: Song Gao <gaosong@loongson.cn> Co-authored-by: Philippe Mathieu-Daudé <philmd@redhat.com> Co-authored-by: Thomas Huth <thuth@redhat.com> Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com> Co-authored-by: Marc-André Lureau <marcandre.lureau@redhat.com> Co-authored-by: Gerd Hoffmann <kraxel@redhat.com> Co-authored-by: Mauro Matteo Cascella <mcascell@redhat.com> Co-authored-by: Felix xq Queißner <xq@random-projects.net> Co-authored-by: Xiaojuan Yang <yangxiaojuan@loongson.cn> Co-authored-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Co-authored-by: Taylor Simpson <tsimpson@quicinc.com> Co-authored-by: Eugenio Pérez <eperezma@redhat.com> Co-authored-by: Zhang Chen <chen.zhang@intel.com> Co-authored-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Co-authored-by: Peter Xu <peterx@redhat.com> Co-authored-by: Daniel P. Berrangé <berrange@redhat.com> Co-authored-by: Leonardo Bras <leobras@redhat.com> Co-authored-by: Juan Quintela <quintela@redhat.com> Co-authored-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-22 17:02:58 +02:00 · 2022-07-22 17:02:58 +02:00 · c6a00ab288
commit c6a00ab288
parent 03e283c858
720 changed files with 37013 additions and 24515 deletions
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@ -599,7 +599,7 @@ build-tools-and-docs-debian:
    optional: true
  variables:
    IMAGE: debian-amd64
-    MAKE_CHECK_ARGS: check-unit check-softfloat ctags TAGS cscope
+    MAKE_CHECK_ARGS: check-unit ctags TAGS cscope
    CONFIGURE_ARGS: --disable-system --disable-user --enable-docs --enable-tools
    QEMU_JOB_PUBLISH: 1
  artifacts:
--- a/.gitlab-ci.d/cirrus/freebsd-12.vars
+++ b/.gitlab-ci.d/cirrus/freebsd-12.vars
@ -1,4 +1,5 @@
 # THIS FILE WAS AUTO-GENERATED
 # ... and then edited to fix py39, pending proper lcitool update.
 #
 #  $ lcitool variables freebsd-12 qemu
 #
@ -11,6 +12,6 @@ MAKE='/usr/local/bin/gmake'
 NINJA='/usr/local/bin/ninja'
 PACKAGING_COMMAND='pkg'
 PIP3='/usr/local/bin/pip-3.8'
-PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache cdrkit-genisoimage ctags curl cyrus-sasl dbus diffutils dtc fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 libepoxy libffi libgcrypt libjpeg-turbo libnfs libspice-server libssh libtasn1 llvm lzo2 meson ncurses nettle ninja opencv perl5 pixman pkgconf png py38-numpy py38-pillow py38-pip py38-sphinx py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio sdl2 sdl2_image snappy spice-protocol tesseract texinfo usbredir virglrenderer vte3 zstd'
+PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache cdrkit-genisoimage ctags curl cyrus-sasl dbus diffutils dtc fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 libepoxy libffi libgcrypt libjpeg-turbo libnfs libspice-server libssh libtasn1 llvm lzo2 meson ncurses nettle ninja opencv perl5 pixman pkgconf png py39-numpy py39-pillow py39-pip py39-sphinx py39-sphinx_rtd_theme py39-virtualenv py39-yaml python3 rpm2cpio sdl2 sdl2_image snappy spice-protocol tesseract texinfo usbredir virglrenderer vte3 zstd'
 PYPI_PKGS=''
 PYTHON='/usr/local/bin/python3'
--- a/.gitlab-ci.d/cirrus/freebsd-13.vars
+++ b/.gitlab-ci.d/cirrus/freebsd-13.vars
@ -1,4 +1,5 @@
 # THIS FILE WAS AUTO-GENERATED
 # ... and then edited to fix py39, pending proper lcitool update.
 #
 #  $ lcitool variables freebsd-13 qemu
 #
@ -11,6 +12,6 @@ MAKE='/usr/local/bin/gmake'
 NINJA='/usr/local/bin/ninja'
 PACKAGING_COMMAND='pkg'
 PIP3='/usr/local/bin/pip-3.8'
-PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache cdrkit-genisoimage ctags curl cyrus-sasl dbus diffutils dtc fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 libepoxy libffi libgcrypt libjpeg-turbo libnfs libspice-server libssh libtasn1 llvm lzo2 meson ncurses nettle ninja opencv perl5 pixman pkgconf png py38-numpy py38-pillow py38-pip py38-sphinx py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio sdl2 sdl2_image snappy spice-protocol tesseract texinfo usbredir virglrenderer vte3 zstd'
+PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache cdrkit-genisoimage ctags curl cyrus-sasl dbus diffutils dtc fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 libepoxy libffi libgcrypt libjpeg-turbo libnfs libspice-server libssh libtasn1 llvm lzo2 meson ncurses nettle ninja opencv perl5 pixman pkgconf png py39-numpy py39-pillow py39-pip py39-sphinx py39-sphinx_rtd_theme py39-virtualenv py39-yaml python3 rpm2cpio sdl2 sdl2_image snappy spice-protocol tesseract texinfo usbredir virglrenderer vte3 zstd'
 PYPI_PKGS=''
 PYTHON='/usr/local/bin/python3'
--- a/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
+++ b/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
@ -8,8 +8,6 @@ ubuntu-20.04-s390x-all-linux-static:
 tags:
 - ubuntu_20.04
 - s390x
 variables:
    DFLTCC: 0
 rules:
 - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'
 - if: "$S390X_RUNNER_AVAILABLE"
@ -29,8 +27,7 @@ ubuntu-20.04-s390x-all:
 tags:
 - ubuntu_20.04
 - s390x
- variables:
+ timeout: 75m
    DFLTCC: 0
 rules:
 - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'
 - if: "$S390X_RUNNER_AVAILABLE"
@ -47,8 +44,6 @@ ubuntu-20.04-s390x-alldbg:
 tags:
 - ubuntu_20.04
 - s390x
 variables:
    DFLTCC: 0
 rules:
 - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'
   when: manual
@ -70,8 +65,6 @@ ubuntu-20.04-s390x-clang:
 tags:
 - ubuntu_20.04
 - s390x
 variables:
    DFLTCC: 0
 rules:
 - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'
   when: manual
@ -92,8 +85,6 @@ ubuntu-20.04-s390x-tci:
 tags:
 - ubuntu_20.04
 - s390x
 variables:
    DFLTCC: 0
 rules:
 - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'
   when: manual
@ -113,8 +104,6 @@ ubuntu-20.04-s390x-notcg:
 tags:
 - ubuntu_20.04
 - s390x
 variables:
    DFLTCC: 0
 rules:
 - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'
   when: manual
--- a/.gitlab-ci.d/edk2.yml
+++ b/.gitlab-ci.d/edk2.yml
@ -1,60 +1,85 @@
 # All jobs needing docker-edk2 must use the same rules it uses.
 .edk2_job_rules:
- rules: # Only run this job when ...
+  rules:
- - changes:
+    # Forks don't get pipelines unless QEMU_CI=1 or QEMU_CI=2 is set
-   # this file is modified
+    - if: '$QEMU_CI != "1" && $QEMU_CI != "2" && $CI_PROJECT_NAMESPACE != "qemu-project"'
-   - .gitlab-ci.d/edk2.yml
+      when: never
-   # or the Dockerfile is modified
+
-   - .gitlab-ci.d/edk2/Dockerfile
+    # In forks, if QEMU_CI=1 is set, then create manual job
-   # or roms/edk2/ is modified (submodule updated)
+    # if any of the files affecting the build are touched
-   - roms/edk2/*
+    - if: '$QEMU_CI == "1" && $CI_PROJECT_NAMESPACE != "qemu-project"'
-   when: on_success
+      changes:
- - if: '$CI_COMMIT_REF_NAME =~ /^edk2/' # or the branch/tag starts with 'edk2'
+        - .gitlab-ci.d/edk2.yml
-   when: on_success
+        - .gitlab-ci.d/edk2/Dockerfile
- - if: '$CI_COMMIT_MESSAGE =~ /edk2/i' # or last commit description contains 'EDK2'
+        - roms/edk2/*
-   when: on_success
+      when: manual
    # In forks, if QEMU_CI=1 is set, then create manual job
    # if the branch/tag starts with 'edk2'
    - if: '$QEMU_CI == "1" && $CI_PROJECT_NAMESPACE != "qemu-project" && $CI_COMMIT_REF_NAME =~ /^edk2/'
      when: manual
    # In forks, if QEMU_CI=1 is set, then create manual job
    # if last commit msg contains 'EDK2' (case insensitive)
    - if: '$QEMU_CI == "1" && $CI_PROJECT_NAMESPACE != "qemu-project" && $CI_COMMIT_MESSAGE =~ /edk2/i'
      when: manual
    # Run if any files affecting the build output are touched
    - changes:
        - .gitlab-ci.d/edk2.yml
        - .gitlab-ci.d/edk2/Dockerfile
        - roms/edk2/*
      when: on_success
    # Run if the branch/tag starts with 'edk2'
    - if: '$CI_COMMIT_REF_NAME =~ /^edk2/'
      when: on_success
    # Run if last commit msg contains 'EDK2' (case insensitive)
    - if: '$CI_COMMIT_MESSAGE =~ /edk2/i'
      when: on_success
 docker-edk2:
- extends: .edk2_job_rules
+  extends: .edk2_job_rules
- stage: containers
+  stage: containers
- image: docker:19.03.1
+  image: docker:19.03.1
- services:
+  services:
- - docker:19.03.1-dind
+    - docker:19.03.1-dind
- variables:
+  variables:
-  GIT_DEPTH: 3
+    GIT_DEPTH: 3
-  IMAGE_TAG: $CI_REGISTRY_IMAGE:edk2-cross-build
+    IMAGE_TAG: $CI_REGISTRY_IMAGE:edk2-cross-build
-  # We don't use TLS
+    # We don't use TLS
-  DOCKER_HOST: tcp://docker:2375
+    DOCKER_HOST: tcp://docker:2375
-  DOCKER_TLS_CERTDIR: ""
+    DOCKER_TLS_CERTDIR: ""
- before_script:
+  before_script:
- - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
+    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- script:
+  script:
- - docker pull $IMAGE_TAG || true
+    - docker pull $IMAGE_TAG || true
- - docker build --cache-from $IMAGE_TAG --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
+    - docker build --cache-from $IMAGE_TAG --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
-                                        --tag $IMAGE_TAG .gitlab-ci.d/edk2
+                                           --tag $IMAGE_TAG .gitlab-ci.d/edk2
- - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
+    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
- - docker push $IMAGE_TAG
+    - docker push $IMAGE_TAG
 build-edk2:
- extends: .edk2_job_rules
+  extends: .edk2_job_rules
- stage: build
+  stage: build
- needs: ['docker-edk2']
+  needs: ['docker-edk2']
- artifacts:
+  artifacts:
-   paths: # 'artifacts.zip' will contains the following files:
+    paths: # 'artifacts.zip' will contains the following files:
-   - pc-bios/edk2*bz2
+      - pc-bios/edk2*bz2
-   - pc-bios/edk2-licenses.txt
+      - pc-bios/edk2-licenses.txt
-   - edk2-stdout.log
+      - edk2-stdout.log
-   - edk2-stderr.log
+      - edk2-stderr.log
- image: $CI_REGISTRY_IMAGE:edk2-cross-build
+  image: $CI_REGISTRY_IMAGE:edk2-cross-build
- variables:
+  variables:
-   GIT_DEPTH: 3
+    GIT_DEPTH: 3
- script: # Clone the required submodules and build EDK2
+  script: # Clone the required submodules and build EDK2
- - git submodule update --init roms/edk2
+    - git submodule update --init roms/edk2
- - git -C roms/edk2 submodule update --init --
+    - git -C roms/edk2 submodule update --init --
-     ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3
+       ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3
-     BaseTools/Source/C/BrotliCompress/brotli
+       BaseTools/Source/C/BrotliCompress/brotli
-     CryptoPkg/Library/OpensslLib/openssl
+       CryptoPkg/Library/OpensslLib/openssl
-     MdeModulePkg/Library/BrotliCustomDecompressLib/brotli
+       MdeModulePkg/Library/BrotliCustomDecompressLib/brotli
- - export JOBS=$(($(getconf _NPROCESSORS_ONLN) + 1))
+    - export JOBS=$(($(getconf _NPROCESSORS_ONLN) + 1))
- - echo "=== Using ${JOBS} simultaneous jobs ==="
+    - echo "=== Using ${JOBS} simultaneous jobs ==="
- - make -j${JOBS} -C roms efi 2>&1 1>edk2-stdout.log | tee -a edk2-stderr.log >&2
+    - make -j${JOBS} -C roms efi 2>&1 1>edk2-stdout.log | tee -a edk2-stderr.log >&2
--- a/.gitlab-ci.d/opensbi.yml
+++ b/.gitlab-ci.d/opensbi.yml
@ -1,61 +1,85 @@
 # All jobs needing docker-opensbi must use the same rules it uses.
 .opensbi_job_rules:
- rules: # Only run this job when ...
+  rules:
- - changes:
+    # Forks don't get pipelines unless QEMU_CI=1 or QEMU_CI=2 is set
-   # this file is modified
+    - if: '$QEMU_CI != "1" && $QEMU_CI != "2" && $CI_PROJECT_NAMESPACE != "qemu-project"'
-   - .gitlab-ci.d/opensbi.yml
+      when: never
-   # or the Dockerfile is modified
+
-   - .gitlab-ci.d/opensbi/Dockerfile
+    # In forks, if QEMU_CI=1 is set, then create manual job
-   when: on_success
+    # if any files affecting the build output are touched
- - changes: # or roms/opensbi/ is modified (submodule updated)
+    - if: '$QEMU_CI == "1" && $CI_PROJECT_NAMESPACE != "qemu-project"'
-   - roms/opensbi/*
+      changes:
-   when: on_success
+        - .gitlab-ci.d/opensbi.yml
- - if: '$CI_COMMIT_REF_NAME =~ /^opensbi/' # or the branch/tag starts with 'opensbi'
+        - .gitlab-ci.d/opensbi/Dockerfile
-   when: on_success
+        - roms/opensbi/*
- - if: '$CI_COMMIT_MESSAGE =~ /opensbi/i' # or last commit description contains 'OpenSBI'
+      when: manual
-   when: on_success
+
    # In forks, if QEMU_CI=1 is set, then create manual job
    # if the branch/tag starts with 'opensbi'
    - if: '$QEMU_CI == "1" && $CI_PROJECT_NAMESPACE != "qemu-project" && $CI_COMMIT_REF_NAME =~ /^opensbi/'
      when: manual
    # In forks, if QEMU_CI=1 is set, then create manual job
    # if the last commit msg contains 'OpenSBI' (case insensitive)
    - if: '$QEMU_CI == "1" && $CI_PROJECT_NAMESPACE != "qemu-project" && $CI_COMMIT_MESSAGE =~ /opensbi/i'
      when: manual
    # Run if any files affecting the build output are touched
    - changes:
        - .gitlab-ci.d/opensbi.yml
        - .gitlab-ci.d/opensbi/Dockerfile
        - roms/opensbi/*
      when: on_success
    # Run if the branch/tag starts with 'opensbi'
    - if: '$CI_COMMIT_REF_NAME =~ /^opensbi/'
      when: on_success
    # Run if the last commit msg contains 'OpenSBI' (case insensitive)
    - if: '$CI_COMMIT_MESSAGE =~ /opensbi/i'
      when: on_success
 docker-opensbi:
- extends: .opensbi_job_rules
+  extends: .opensbi_job_rules
- stage: containers
+  stage: containers
- image: docker:19.03.1
+  image: docker:19.03.1
- services:
+  services:
- - docker:19.03.1-dind
+    - docker:19.03.1-dind
- variables:
+  variables:
-  GIT_DEPTH: 3
+    GIT_DEPTH: 3
-  IMAGE_TAG: $CI_REGISTRY_IMAGE:opensbi-cross-build
+    IMAGE_TAG: $CI_REGISTRY_IMAGE:opensbi-cross-build
-  # We don't use TLS
+    # We don't use TLS
-  DOCKER_HOST: tcp://docker:2375
+    DOCKER_HOST: tcp://docker:2375
-  DOCKER_TLS_CERTDIR: ""
+    DOCKER_TLS_CERTDIR: ""
- before_script:
+  before_script:
- - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
+    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- script:
+  script:
- - docker pull $IMAGE_TAG || true
+    - docker pull $IMAGE_TAG || true
- - docker build --cache-from $IMAGE_TAG --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
+    - docker build --cache-from $IMAGE_TAG --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
-                                        --tag $IMAGE_TAG .gitlab-ci.d/opensbi
+                                           --tag $IMAGE_TAG .gitlab-ci.d/opensbi
- - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
+    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
- - docker push $IMAGE_TAG
+    - docker push $IMAGE_TAG
 build-opensbi:
- extends: .opensbi_job_rules
+  extends: .opensbi_job_rules
- stage: build
+  stage: build
- needs: ['docker-opensbi']
+  needs: ['docker-opensbi']
- artifacts:
+  artifacts:
-   paths: # 'artifacts.zip' will contains the following files:
+    paths: # 'artifacts.zip' will contains the following files:
-   - pc-bios/opensbi-riscv32-generic-fw_dynamic.bin
+      - pc-bios/opensbi-riscv32-generic-fw_dynamic.bin
-   - pc-bios/opensbi-riscv64-generic-fw_dynamic.bin
+      - pc-bios/opensbi-riscv64-generic-fw_dynamic.bin
-   - opensbi32-generic-stdout.log
+      - opensbi32-generic-stdout.log
-   - opensbi32-generic-stderr.log
+      - opensbi32-generic-stderr.log
-   - opensbi64-generic-stdout.log
+      - opensbi64-generic-stdout.log
-   - opensbi64-generic-stderr.log
+      - opensbi64-generic-stderr.log
- image: $CI_REGISTRY_IMAGE:opensbi-cross-build
+  image: $CI_REGISTRY_IMAGE:opensbi-cross-build
- variables:
+  variables:
-   GIT_DEPTH: 3
+    GIT_DEPTH: 3
- script: # Clone the required submodules and build OpenSBI
+  script: # Clone the required submodules and build OpenSBI
- - git submodule update --init roms/opensbi
+    - git submodule update --init roms/opensbi
- - export JOBS=$(($(getconf _NPROCESSORS_ONLN) + 1))
+    - export JOBS=$(($(getconf _NPROCESSORS_ONLN) + 1))
- - echo "=== Using ${JOBS} simultaneous jobs ==="
+    - echo "=== Using ${JOBS} simultaneous jobs ==="
- - make -j${JOBS} -C roms/opensbi clean
+    - make -j${JOBS} -C roms/opensbi clean
- - make -j${JOBS} -C roms opensbi32-generic 2>&1 1>opensbi32-generic-stdout.log | tee -a opensbi32-generic-stderr.log >&2
+    - make -j${JOBS} -C roms opensbi32-generic 2>&1 1>opensbi32-generic-stdout.log | tee -a opensbi32-generic-stderr.log >&2
- - make -j${JOBS} -C roms/opensbi clean
+    - make -j${JOBS} -C roms/opensbi clean
- - make -j${JOBS} -C roms opensbi64-generic 2>&1 1>opensbi64-generic-stdout.log | tee -a opensbi64-generic-stderr.log >&2
+    - make -j${JOBS} -C roms opensbi64-generic 2>&1 1>opensbi64-generic-stdout.log | tee -a opensbi64-generic-stderr.log >&2
--- a/.travis.yml
+++ b/.travis.yml
@ -218,12 +218,11 @@ jobs:
        - TEST_CMD="make check check-tcg V=1"
        - CONFIG="--disable-containers --target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
        - UNRELIABLE=true
        - DFLTCC=0
      script:
        - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
        - |
          if [ "$BUILD_RC" -eq 0 ] ; then
-              mv pc-bios/s390-ccw/*.img pc-bios/ ;
+              mv pc-bios/s390-ccw/*.img qemu-bundle/usr/local/share/qemu ;
              ${TEST_CMD} ;
          else
              $(exit $BUILD_RC);
@ -258,7 +257,7 @@ jobs:
      env:
        - CONFIG="--disable-containers --audio-drv-list=sdl --disable-user
                  --target-list-exclude=${MAIN_SOFTMMU_TARGETS}"
-        - DFLTCC=0
+
    - name: "[s390x] GCC (user)"
      arch: s390x
      dist: focal
@ -270,7 +269,7 @@ jobs:
          - ninja-build
      env:
        - CONFIG="--disable-containers --disable-system"
-        - DFLTCC=0
+
    - name: "[s390x] Clang (disable-tcg)"
      arch: s390x
      dist: focal
@ -304,4 +303,3 @@ jobs:
        - CONFIG="--disable-containers --disable-tcg --enable-kvm
                  --disable-tools --host-cc=clang --cxx=clang++"
        - UNRELIABLE=true
        - DFLTCC=0
--- a/41
+++ b/41
@ -165,8 +165,6 @@ F: tests/qtest/arm-cpu-features.c
 F: hw/arm/
 F: hw/cpu/a*mpcore.c
 F: include/hw/cpu/a*mpcore.h
 F: disas/arm-a64.cc
 F: disas/libvixl/
 F: docs/system/target-arm.rst
 F: docs/system/arm/cpu-features.rst
@ -1067,6 +1065,7 @@ F: hw/net/ftgmac100.c
 F: include/hw/net/ftgmac100.h
 F: docs/system/arm/aspeed.rst
 F: tests/qtest/*aspeed*
 F: hw/arm/fby35.c
 NRF51
 M: Joel Stanley <joel@jms.id.au>
@ -1840,7 +1839,6 @@ R: Ani Sinha <ani@anisinha.ca>
 S: Supported
 F: include/hw/acpi/*
 F: include/hw/firmware/smbios.h
 F: hw/mem/*
 F: hw/acpi/*
 F: hw/smbios/*
 F: hw/i386/acpi-build.[hc]
@ -1851,6 +1849,7 @@ F: tests/qtest/acpi-utils.[hc]
 F: tests/data/acpi/
 F: docs/specs/acpi_cpu_hotplug.rst
 F: docs/specs/acpi_mem_hotplug.rst
 F: docs/specs/acpi_nvdimm.rst
 F: docs/specs/acpi_pci_hotplug.rst
 F: docs/specs/acpi_hw_reduced_hotplug.rst
@ -2158,15 +2157,6 @@ F: qapi/rocker.json
 F: tests/rocker/
 F: docs/specs/rocker.txt
 NVDIMM
 M: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
 S: Maintained
 F: hw/acpi/nvdimm.c
 F: hw/mem/nvdimm.c
 F: include/hw/mem/nvdimm.h
 F: docs/nvdimm.txt
 F: docs/specs/acpi_nvdimm.rst
 e1000x
 M: Dmitry Fleytman <dmitry.fleytman@gmail.com>
 S: Maintained
@ -2588,6 +2578,7 @@ M: Ben Widawsky <ben.widawsky@intel.com>
 M: Jonathan Cameron <jonathan.cameron@huawei.com>
 S: Supported
 F: hw/cxl/
 F: hw/mem/cxl_type3.c
 F: include/hw/cxl/
 Dirty Bitmaps
@ -2704,6 +2695,19 @@ F: softmmu/physmem.c
 F: include/exec/memory-internal.h
 F: scripts/coccinelle/memory-region-housekeeping.cocci
 Memory devices
 M: David Hildenbrand <david@redhat.com>
 M: Igor Mammedov <imammedo@redhat.com>
 R: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
 S: Supported
 F: hw/mem/memory-device.c
 F: hw/mem/nvdimm.c
 F: hw/mem/pc-dimm.c
 F: include/hw/mem/memory-device.h
 F: include/hw/mem/nvdimm.h
 F: include/hw/mem/pc-dimm.h
 F: docs/nvdimm.txt
 SPICE
 M: Gerd Hoffmann <kraxel@redhat.com>
 S: Odd Fixes
@ -2746,6 +2750,7 @@ F: softmmu/cpu-throttle.c
 F: softmmu/cpu-timers.c
 F: softmmu/icount.c
 F: softmmu/runstate-action.c
 F: softmmu/runstate.c
 F: qapi/run-state.json
 Read, Copy, Update (RCU)
@ -2876,6 +2881,7 @@ T: git https://repo.or.cz/qemu/armbru.git qapi-next
 QEMU Guest Agent
 M: Michael Roth <michael.roth@amd.com>
 M: Konstantin Kostiuk <kkostiuk@redhat.com>
 S: Maintained
 F: qga/
 F: docs/interop/qemu-ga.rst
@ -3307,8 +3313,6 @@ M: Richard Henderson <richard.henderson@linaro.org>
 S: Maintained
 L: qemu-arm@nongnu.org
 F: tcg/aarch64/
 F: disas/arm-a64.cc
 F: disas/libvixl/
 ARM TCG target
 M: Richard Henderson <richard.henderson@linaro.org>
@ -3580,6 +3584,8 @@ M: Coiby Xu <Coiby.Xu@gmail.com>
 S: Maintained
 F: block/export/vhost-user-blk-server.c
 F: block/export/vhost-user-blk-server.h
 F: block/export/virtio-blk-handler.c
 F: block/export/virtio-blk-handler.h
 F: include/qemu/vhost-user-server.h
 F: tests/qtest/libqos/vhost-user-blk.c
 F: tests/qtest/libqos/vhost-user-blk.h
@ -3592,6 +3598,13 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 VDUSE library and block device exports
 M: Xie Yongji <xieyongji@bytedance.com>
 S: Maintained
 F: subprojects/libvduse/
 F: block/export/vduse-blk.c
 F: block/export/vduse-blk.h
 Replication
 M: Wen Congyang <wencongyang2@huawei.com>
 M: Xie Changlong <xiechanglong.d@gmail.com>
--- a/16
+++ b/16
@ -87,7 +87,7 @@ x := $(shell rm -rf meson-private meson-info meson-logs)
 endif
 # 1. ensure config-host.mak is up-to-date
-config-host.mak: $(SRC_PATH)/configure $(SRC_PATH)/scripts/meson-buildoptions.sh $(SRC_PATH)/pc-bios $(SRC_PATH)/VERSION
+config-host.mak: $(SRC_PATH)/configure $(SRC_PATH)/scripts/meson-buildoptions.sh $(SRC_PATH)/VERSION
 	@echo config-host.mak is out-of-date, running configure
 	@if test -f meson-private/coredata.dat; then \
 	  ./config.status --skip-meson; \
@ -186,16 +186,14 @@ include $(SRC_PATH)/tests/Makefile.include
 all: recurse-all
-ROM_DIRS = $(addprefix pc-bios/, $(ROMS))
+ROMS_RULES=$(foreach t, all clean, $(addsuffix /$(t), $(ROMS)))
-ROM_DIRS_RULES=$(foreach t, all clean, $(addsuffix /$(t), $(ROM_DIRS)))
+.PHONY: $(ROMS_RULES)
-# Only keep -O and -g cflags
+$(ROMS_RULES):
 .PHONY: $(ROM_DIRS_RULES)
 $(ROM_DIRS_RULES):
 	$(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) -C $(dir $@) V="$(V)" TARGET_DIR="$(dir $@)" $(notdir $@),)
 .PHONY: recurse-all recurse-clean
-recurse-all: $(addsuffix /all, $(ROM_DIRS))
+recurse-all: $(addsuffix /all, $(ROMS))
-recurse-clean: $(addsuffix /clean, $(ROM_DIRS))
+recurse-clean: $(addsuffix /clean, $(ROMS))
 ######################################################################
@ -218,7 +216,7 @@ qemu-%.tar.bz2:
 distclean: clean
 	-$(quiet-@)test -f build.ninja && $(NINJA) $(NINJAFLAGS) -t clean -g || :
-	rm -f config-host.mak
+	rm -f config-host.mak qemu-bundle
 	rm -f tests/tcg/config-*.mak
 	rm -f config.status
 	rm -f roms/seabios/config.mak
--- a/accel/accel-common.c
+++ b/accel/accel-common.c
@ -49,6 +49,14 @@ AccelClass *accel_find(const char *opt_name)
    return ac;
 }
 /* Return the name of the current accelerator */
 const char *current_accel_name(void)
 {
    AccelClass *ac = ACCEL_GET_CLASS(current_accel());
    return ac->name;
 }
 static void accel_init_cpu_int_aux(ObjectClass *klass, void *opaque)
 {
    CPUClass *cc = CPU_CLASS(klass);
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@ -45,8 +45,10 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
 #include "sysemu/dirtylimit.h"
 #include "hw/boards.h"
 #include "monitor/stats.h"
 /* This check must be after config-host.h is included */
 #ifdef CONFIG_EVENTFD
@ -476,6 +478,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
    cpu->kvm_state = s;
    cpu->vcpu_dirty = true;
    cpu->dirty_pages = 0;
    cpu->throttle_us_per_full = 0;
    mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
    if (mmap_size < 0) {
@ -756,17 +759,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
 }
 /* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
 {
    int ret;
    CPUState *cpu;
    uint64_t total = 0;
    int64_t stamp;
    stamp = get_clock();
-    CPU_FOREACH(cpu) {
+    if (cpu) {
-        total += kvm_dirty_ring_reap_one(s, cpu);
+        total = kvm_dirty_ring_reap_one(s, cpu);
    } else {
        CPU_FOREACH(cpu) {
            total += kvm_dirty_ring_reap_one(s, cpu);
        }
    }
    if (total) {
@ -787,7 +793,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
 * Currently for simplicity, we must hold BQL before calling this.  We can
 * consider to drop the BQL if we're clear with all the race conditions.
 */
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
 {
    uint64_t total;
@ -807,7 +813,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
     *     reset below.
     */
    kvm_slots_lock();
-    total = kvm_dirty_ring_reap_locked(s);
+    total = kvm_dirty_ring_reap_locked(s, cpu);
    kvm_slots_unlock();
    return total;
@ -854,7 +860,7 @@ static void kvm_dirty_ring_flush(void)
     * vcpus out in a synchronous way.
     */
    kvm_cpu_synchronize_kick_all();
-    kvm_dirty_ring_reap(kvm_state);
+    kvm_dirty_ring_reap(kvm_state, NULL);
    trace_kvm_dirty_ring_flush(1);
 }
@ -1398,7 +1404,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                 * Not easy.  Let's cross the fingers until it's fixed.
                 */
                if (kvm_state->kvm_dirty_ring_size) {
-                    kvm_dirty_ring_reap_locked(kvm_state);
+                    kvm_dirty_ring_reap_locked(kvm_state, NULL);
                } else {
                    kvm_slot_get_dirty_log(kvm_state, mem);
                }
@ -1466,11 +1472,16 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
         */
        sleep(1);
        /* keep sleeping so that dirtylimit not be interfered by reaper */
        if (dirtylimit_in_service()) {
            continue;
        }
        trace_kvm_dirty_ring_reaper("wakeup");
        r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
        qemu_mutex_lock_iothread();
-        kvm_dirty_ring_reap(s);
+        kvm_dirty_ring_reap(s, NULL);
        qemu_mutex_unlock_iothread();
        r->reaper_iteration++;
@ -2310,6 +2321,15 @@ bool kvm_dirty_ring_enabled(void)
    return kvm_state->kvm_dirty_ring_size ? true : false;
 }
 static void query_stats_cb(StatsResultList **result, StatsTarget target,
                           strList *names, strList *targets, Error **errp);
 static void query_stats_schemas_cb(StatsSchemaList **result, Error **errp);
 uint32_t kvm_dirty_ring_size(void)
 {
    return kvm_state->kvm_dirty_ring_size;
 }
 static int kvm_init(MachineState *ms)
 {
    MachineClass *mc = MACHINE_GET_CLASS(ms);
@ -2638,6 +2658,11 @@ static int kvm_init(MachineState *ms)
        }
    }
    if (kvm_check_extension(kvm_state, KVM_CAP_BINARY_STATS_FD)) {
        add_stats_callbacks(STATS_PROVIDER_KVM, query_stats_cb,
                            query_stats_schemas_cb);
    }
    return 0;
 err:
@ -2957,8 +2982,19 @@ int kvm_cpu_exec(CPUState *cpu)
             */
            trace_kvm_dirty_ring_full(cpu->cpu_index);
            qemu_mutex_lock_iothread();
-            kvm_dirty_ring_reap(kvm_state);
+            /*
             * We throttle vCPU by making it sleep once it exit from kernel
             * due to dirty ring full. In the dirtylimit scenario, reaping
             * all vCPUs after a single vCPU dirty ring get full result in
             * the miss of sleep, so just reap the ring-fulled vCPU.
             */
            if (dirtylimit_in_service()) {
                kvm_dirty_ring_reap(kvm_state, cpu);
            } else {
                kvm_dirty_ring_reap(kvm_state, NULL);
            }
            qemu_mutex_unlock_iothread();
            dirtylimit_vcpu_execute(cpu);
            ret = 0;
            break;
        case KVM_EXIT_SYSTEM_EVENT:
@ -3697,3 +3733,405 @@ static void kvm_type_init(void)
 }
 type_init(kvm_type_init);
 typedef struct StatsArgs {
    union StatsResultsType {
        StatsResultList **stats;
        StatsSchemaList **schema;
    } result;
    strList *names;
    Error **errp;
 } StatsArgs;
 static StatsList *add_kvmstat_entry(struct kvm_stats_desc *pdesc,
                                    uint64_t *stats_data,
                                    StatsList *stats_list,
                                    Error **errp)
 {
    Stats *stats;
    uint64List *val_list = NULL;
    /* Only add stats that we understand.  */
    switch (pdesc->flags & KVM_STATS_TYPE_MASK) {
    case KVM_STATS_TYPE_CUMULATIVE:
    case KVM_STATS_TYPE_INSTANT:
    case KVM_STATS_TYPE_PEAK:
    case KVM_STATS_TYPE_LINEAR_HIST:
    case KVM_STATS_TYPE_LOG_HIST:
        break;
    default:
        return stats_list;
    }
    switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
    case KVM_STATS_UNIT_NONE:
    case KVM_STATS_UNIT_BYTES:
    case KVM_STATS_UNIT_CYCLES:
    case KVM_STATS_UNIT_SECONDS:
    case KVM_STATS_UNIT_BOOLEAN:
        break;
    default:
        return stats_list;
    }
    switch (pdesc->flags & KVM_STATS_BASE_MASK) {
    case KVM_STATS_BASE_POW10:
    case KVM_STATS_BASE_POW2:
        break;
    default:
        return stats_list;
    }
    /* Alloc and populate data list */
    stats = g_new0(Stats, 1);
    stats->name = g_strdup(pdesc->name);
    stats->value = g_new0(StatsValue, 1);;
    if ((pdesc->flags & KVM_STATS_UNIT_MASK) == KVM_STATS_UNIT_BOOLEAN) {
        stats->value->u.boolean = *stats_data;
        stats->value->type = QTYPE_QBOOL;
    } else if (pdesc->size == 1) {
        stats->value->u.scalar = *stats_data;
        stats->value->type = QTYPE_QNUM;
    } else {
        int i;
        for (i = 0; i < pdesc->size; i++) {
            QAPI_LIST_PREPEND(val_list, stats_data[i]);
        }
        stats->value->u.list = val_list;
        stats->value->type = QTYPE_QLIST;
    }
    QAPI_LIST_PREPEND(stats_list, stats);
    return stats_list;
 }
 static StatsSchemaValueList *add_kvmschema_entry(struct kvm_stats_desc *pdesc,
                                                 StatsSchemaValueList *list,
                                                 Error **errp)
 {
    StatsSchemaValueList *schema_entry = g_new0(StatsSchemaValueList, 1);
    schema_entry->value = g_new0(StatsSchemaValue, 1);
    switch (pdesc->flags & KVM_STATS_TYPE_MASK) {
    case KVM_STATS_TYPE_CUMULATIVE:
        schema_entry->value->type = STATS_TYPE_CUMULATIVE;
        break;
    case KVM_STATS_TYPE_INSTANT:
        schema_entry->value->type = STATS_TYPE_INSTANT;
        break;
    case KVM_STATS_TYPE_PEAK:
        schema_entry->value->type = STATS_TYPE_PEAK;
        break;
    case KVM_STATS_TYPE_LINEAR_HIST:
        schema_entry->value->type = STATS_TYPE_LINEAR_HISTOGRAM;
        schema_entry->value->bucket_size = pdesc->bucket_size;
        schema_entry->value->has_bucket_size = true;
        break;
    case KVM_STATS_TYPE_LOG_HIST:
        schema_entry->value->type = STATS_TYPE_LOG2_HISTOGRAM;
        break;
    default:
        goto exit;
    }
    switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
    case KVM_STATS_UNIT_NONE:
        break;
    case KVM_STATS_UNIT_BOOLEAN:
        schema_entry->value->has_unit = true;
        schema_entry->value->unit = STATS_UNIT_BOOLEAN;
        break;
    case KVM_STATS_UNIT_BYTES:
        schema_entry->value->has_unit = true;
        schema_entry->value->unit = STATS_UNIT_BYTES;
        break;
    case KVM_STATS_UNIT_CYCLES:
        schema_entry->value->has_unit = true;
        schema_entry->value->unit = STATS_UNIT_CYCLES;
        break;
    case KVM_STATS_UNIT_SECONDS:
        schema_entry->value->has_unit = true;
        schema_entry->value->unit = STATS_UNIT_SECONDS;
        break;
    default:
        goto exit;
    }
    schema_entry->value->exponent = pdesc->exponent;
    if (pdesc->exponent) {
        switch (pdesc->flags & KVM_STATS_BASE_MASK) {
        case KVM_STATS_BASE_POW10:
            schema_entry->value->has_base = true;
            schema_entry->value->base = 10;
            break;
        case KVM_STATS_BASE_POW2:
            schema_entry->value->has_base = true;
            schema_entry->value->base = 2;
            break;
        default:
            goto exit;
        }
    }
    schema_entry->value->name = g_strdup(pdesc->name);
    schema_entry->next = list;
    return schema_entry;
 exit:
    g_free(schema_entry->value);
    g_free(schema_entry);
    return list;
 }
 /* Cached stats descriptors */
 typedef struct StatsDescriptors {
    const char *ident; /* cache key, currently the StatsTarget */
    struct kvm_stats_desc *kvm_stats_desc;
    struct kvm_stats_header *kvm_stats_header;
    QTAILQ_ENTRY(StatsDescriptors) next;
 } StatsDescriptors;
 static QTAILQ_HEAD(, StatsDescriptors) stats_descriptors =
    QTAILQ_HEAD_INITIALIZER(stats_descriptors);
 /*
 * Return the descriptors for 'target', that either have already been read
 * or are retrieved from 'stats_fd'.
 */
 static StatsDescriptors *find_stats_descriptors(StatsTarget target, int stats_fd,
                                                Error **errp)
 {
    StatsDescriptors *descriptors;
    const char *ident;
    struct kvm_stats_desc *kvm_stats_desc;
    struct kvm_stats_header *kvm_stats_header;
    size_t size_desc;
    ssize_t ret;
    ident = StatsTarget_str(target);
    QTAILQ_FOREACH(descriptors, &stats_descriptors, next) {
        if (g_str_equal(descriptors->ident, ident)) {
            return descriptors;
        }
    }
    descriptors = g_new0(StatsDescriptors, 1);
    /* Read stats header */
    kvm_stats_header = g_malloc(sizeof(*kvm_stats_header));
    ret = read(stats_fd, kvm_stats_header, sizeof(*kvm_stats_header));
    if (ret != sizeof(*kvm_stats_header)) {
        error_setg(errp, "KVM stats: failed to read stats header: "
                   "expected %zu actual %zu",
                   sizeof(*kvm_stats_header), ret);
        g_free(descriptors);
        return NULL;
    }
    size_desc = sizeof(*kvm_stats_desc) + kvm_stats_header->name_size;
    /* Read stats descriptors */
    kvm_stats_desc = g_malloc0_n(kvm_stats_header->num_desc, size_desc);
    ret = pread(stats_fd, kvm_stats_desc,
                size_desc * kvm_stats_header->num_desc,
                kvm_stats_header->desc_offset);
    if (ret != size_desc * kvm_stats_header->num_desc) {
        error_setg(errp, "KVM stats: failed to read stats descriptors: "
                   "expected %zu actual %zu",
                   size_desc * kvm_stats_header->num_desc, ret);
        g_free(descriptors);
        g_free(kvm_stats_desc);
        return NULL;
    }
    descriptors->kvm_stats_header = kvm_stats_header;
    descriptors->kvm_stats_desc = kvm_stats_desc;
    descriptors->ident = ident;
    QTAILQ_INSERT_TAIL(&stats_descriptors, descriptors, next);
    return descriptors;
 }
 static void query_stats(StatsResultList **result, StatsTarget target,
                        strList *names, int stats_fd, Error **errp)
 {
    struct kvm_stats_desc *kvm_stats_desc;
    struct kvm_stats_header *kvm_stats_header;
    StatsDescriptors *descriptors;
    g_autofree uint64_t *stats_data = NULL;
    struct kvm_stats_desc *pdesc;
    StatsList *stats_list = NULL;
    size_t size_desc, size_data = 0;
    ssize_t ret;
    int i;
    descriptors = find_stats_descriptors(target, stats_fd, errp);
    if (!descriptors) {
        return;
    }
    kvm_stats_header = descriptors->kvm_stats_header;
    kvm_stats_desc = descriptors->kvm_stats_desc;
    size_desc = sizeof(*kvm_stats_desc) + kvm_stats_header->name_size;
    /* Tally the total data size; read schema data */
    for (i = 0; i < kvm_stats_header->num_desc; ++i) {
        pdesc = (void *)kvm_stats_desc + i * size_desc;
        size_data += pdesc->size * sizeof(*stats_data);
    }
    stats_data = g_malloc0(size_data);
    ret = pread(stats_fd, stats_data, size_data, kvm_stats_header->data_offset);
    if (ret != size_data) {
        error_setg(errp, "KVM stats: failed to read data: "
                   "expected %zu actual %zu", size_data, ret);
        return;
    }
    for (i = 0; i < kvm_stats_header->num_desc; ++i) {
        uint64_t *stats;
        pdesc = (void *)kvm_stats_desc + i * size_desc;
        /* Add entry to the list */
        stats = (void *)stats_data + pdesc->offset;
        if (!apply_str_list_filter(pdesc->name, names)) {
            continue;
        }
        stats_list = add_kvmstat_entry(pdesc, stats, stats_list, errp);
    }
    if (!stats_list) {
        return;
    }
    switch (target) {
    case STATS_TARGET_VM:
        add_stats_entry(result, STATS_PROVIDER_KVM, NULL, stats_list);
        break;
    case STATS_TARGET_VCPU:
        add_stats_entry(result, STATS_PROVIDER_KVM,
                        current_cpu->parent_obj.canonical_path,
                        stats_list);
        break;
    default:
        break;
    }
 }
 static void query_stats_schema(StatsSchemaList **result, StatsTarget target,
                               int stats_fd, Error **errp)
 {
    struct kvm_stats_desc *kvm_stats_desc;
    struct kvm_stats_header *kvm_stats_header;
    StatsDescriptors *descriptors;
    struct kvm_stats_desc *pdesc;
    StatsSchemaValueList *stats_list = NULL;
    size_t size_desc;
    int i;
    descriptors = find_stats_descriptors(target, stats_fd, errp);
    if (!descriptors) {
        return;
    }
    kvm_stats_header = descriptors->kvm_stats_header;
    kvm_stats_desc = descriptors->kvm_stats_desc;
    size_desc = sizeof(*kvm_stats_desc) + kvm_stats_header->name_size;
    /* Tally the total data size; read schema data */
    for (i = 0; i < kvm_stats_header->num_desc; ++i) {
        pdesc = (void *)kvm_stats_desc + i * size_desc;
        stats_list = add_kvmschema_entry(pdesc, stats_list, errp);
    }
    add_stats_schema(result, STATS_PROVIDER_KVM, target, stats_list);
 }
 static void query_stats_vcpu(CPUState *cpu, run_on_cpu_data data)
 {
    StatsArgs *kvm_stats_args = (StatsArgs *) data.host_ptr;
    int stats_fd = kvm_vcpu_ioctl(cpu, KVM_GET_STATS_FD, NULL);
    Error *local_err = NULL;
    if (stats_fd == -1) {
        error_setg_errno(&local_err, errno, "KVM stats: ioctl failed");
        error_propagate(kvm_stats_args->errp, local_err);
        return;
    }
    query_stats(kvm_stats_args->result.stats, STATS_TARGET_VCPU,
                kvm_stats_args->names, stats_fd, kvm_stats_args->errp);
    close(stats_fd);
 }
 static void query_stats_schema_vcpu(CPUState *cpu, run_on_cpu_data data)
 {
    StatsArgs *kvm_stats_args = (StatsArgs *) data.host_ptr;
    int stats_fd = kvm_vcpu_ioctl(cpu, KVM_GET_STATS_FD, NULL);
    Error *local_err = NULL;
    if (stats_fd == -1) {
        error_setg_errno(&local_err, errno, "KVM stats: ioctl failed");
        error_propagate(kvm_stats_args->errp, local_err);
        return;
    }
    query_stats_schema(kvm_stats_args->result.schema, STATS_TARGET_VCPU, stats_fd,
                       kvm_stats_args->errp);
    close(stats_fd);
 }
 static void query_stats_cb(StatsResultList **result, StatsTarget target,
                           strList *names, strList *targets, Error **errp)
 {
    KVMState *s = kvm_state;
    CPUState *cpu;
    int stats_fd;
    switch (target) {
    case STATS_TARGET_VM:
    {
        stats_fd = kvm_vm_ioctl(s, KVM_GET_STATS_FD, NULL);
        if (stats_fd == -1) {
            error_setg_errno(errp, errno, "KVM stats: ioctl failed");
            return;
        }
        query_stats(result, target, names, stats_fd, errp);
        close(stats_fd);
        break;
    }
    case STATS_TARGET_VCPU:
    {
        StatsArgs stats_args;
        stats_args.result.stats = result;
        stats_args.names = names;
        stats_args.errp = errp;
        CPU_FOREACH(cpu) {
            if (!apply_str_list_filter(cpu->parent_obj.canonical_path, targets)) {
                continue;
            }
            run_on_cpu(cpu, query_stats_vcpu, RUN_ON_CPU_HOST_PTR(&stats_args));
        }
        break;
    }
    default:
        break;
    }
 }
 void query_stats_schemas_cb(StatsSchemaList **result, Error **errp)
 {
    StatsArgs stats_args;
    KVMState *s = kvm_state;
    int stats_fd;
    stats_fd = kvm_vm_ioctl(s, KVM_GET_STATS_FD, NULL);
    if (stats_fd == -1) {
        error_setg_errno(errp, errno, "KVM stats: ioctl failed");
        return;
    }
    query_stats_schema(result, STATS_TARGET_VM, stats_fd, errp);
    close(stats_fd);
    stats_args.result.schema = result;
    stats_args.errp = errp;
    run_on_cpu(first_cpu, query_stats_schema_vcpu, RUN_ON_CPU_HOST_PTR(&stats_args));
 }
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@ -148,3 +148,8 @@ bool kvm_dirty_ring_enabled(void)
 {
    return false;
 }
 uint32_t kvm_dirty_ring_size(void)
 {
    return 0;
 }
--- a/accel/stubs/tcg-stub.c
+++ b/accel/stubs/tcg-stub.c
@ -21,6 +21,13 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
 {
 }
 int probe_access_flags(CPUArchState *env, target_ulong addr,
                       MMUAccessType access_type, int mmu_idx,
                       bool nonfault, void **phost, uintptr_t retaddr)
 {
     g_assert_not_reached();
 }
 void *probe_access(CPUArchState *env, target_ulong addr, int size,
                   MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
 {
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@ -905,6 +905,8 @@ TranslationBlock *libafl_gen_edge(CPUState *cpu, target_ulong src_block,
                                  target_ulong dst_block, int exit_n, target_ulong cs_base,
                                  uint32_t flags, int cflags);
 extern __thread int libafl_valid_current_cpu;
 //// --- End LibAFL code ---
 /* main execution loop */
@ -917,6 +919,12 @@ int cpu_exec(CPUState *cpu)
    /* replay_interrupt may need current_cpu */
    current_cpu = cpu;
 //// --- Begin LibAFL code ---
    libafl_valid_current_cpu = 1;
 //// --- End LibAFL code ---
    if (cpu_handle_halt(cpu)) {
        return EXCP_HALTED;
    }
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@ -2248,7 +2248,7 @@ store_helper_unaligned(CPUArchState *env, target_ulong addr, uint64_t val,
    const size_t tlb_off = offsetof(CPUTLBEntry, addr_write);
    uintptr_t index, index2;
    CPUTLBEntry *entry, *entry2;
-    target_ulong page2, tlb_addr, tlb_addr2;
+    target_ulong page1, page2, tlb_addr, tlb_addr2;
    MemOpIdx oi;
    size_t size2;
    int i;
@ -2256,15 +2256,17 @@ store_helper_unaligned(CPUArchState *env, target_ulong addr, uint64_t val,
    /*
     * Ensure the second page is in the TLB.  Note that the first page
     * is already guaranteed to be filled, and that the second page
-     * cannot evict the first.
+     * cannot evict the first.  An exception to this rule is PAGE_WRITE_INV
     * handling: the first page could have evicted itself.
     */
    page1 = addr & TARGET_PAGE_MASK;
    page2 = (addr + size) & TARGET_PAGE_MASK;
    size2 = (addr + size) & ~TARGET_PAGE_MASK;
    index2 = tlb_index(env, mmu_idx, page2);
    entry2 = tlb_entry(env, mmu_idx, page2);
    tlb_addr2 = tlb_addr_write(entry2);
-    if (!tlb_hit_page(tlb_addr2, page2)) {
+    if (page1 != page2 && !tlb_hit_page(tlb_addr2, page2)) {
        if (!victim_tlb_hit(env, mmu_idx, index2, tlb_off, page2)) {
            tlb_fill(env_cpu(env), page2, size2, MMU_DATA_STORE,
                     mmu_idx, retaddr);
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@ -70,6 +70,8 @@ static void *mttcg_cpu_thread_fn(void *arg)
    assert(tcg_enabled());
    g_assert(!icount_enabled());
    tcg_cpu_init_cflags(cpu, current_machine->smp.max_cpus > 1);
    rcu_register_thread();
    force_rcu.notifier.notify = mttcg_force_rcu;
    force_rcu.cpu = cpu;
@ -95,6 +97,16 @@ static void *mttcg_cpu_thread_fn(void *arg)
            r = tcg_cpus_exec(cpu);
            qemu_mutex_lock_iothread();
            switch (r) {
 //// --- Begin LibAFL code ---
 #define EXCP_LIBAFL_BP 0xf4775747
            case EXCP_LIBAFL_BP:
                break;
 //// --- End LibAFL code ---
            case EXCP_DEBUG:
                cpu_handle_guest_debug(cpu);
                break;
@ -139,9 +151,6 @@ void mttcg_start_vcpu_thread(CPUState *cpu)
 {
    char thread_name[VCPU_THREAD_NAME_SIZE];
    g_assert(tcg_enabled());
    tcg_cpu_init_cflags(cpu, current_machine->smp.max_cpus > 1);
    cpu->thread = g_new0(QemuThread, 1);
    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
    qemu_cond_init(cpu->halt_cond);
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@ -139,6 +139,111 @@ static void rr_force_rcu(Notifier *notify, void *data)
    rr_kick_next_cpu();
 }
 //// --- Begin LibAFL code ---
 #define EXCP_LIBAFL_BP 0xf4775747
 void libafl_cpu_thread_fn(CPUState *cpu);
 static void default_libafl_start_vcpu(CPUState *cpu) {
    libafl_cpu_thread_fn(cpu);
 }
 void (*libafl_start_vcpu)(CPUState *cpu) = default_libafl_start_vcpu;
 void libafl_cpu_thread_fn(CPUState *cpu)
 {
    rr_start_kick_timer();
    while (1) {
        qemu_mutex_unlock_iothread();
        replay_mutex_lock();
        qemu_mutex_lock_iothread();
        if (icount_enabled()) {
            /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
            icount_account_warp_timer();
            /*
             * Run the timers here.  This is much more efficient than
             * waking up the I/O thread and waiting for completion.
             */
            icount_handle_deadline();
        }
        replay_mutex_unlock();
        if (!cpu) {
            cpu = first_cpu;
        }
        while (cpu && cpu_work_list_empty(cpu) && !cpu->exit_request) {
            qatomic_mb_set(&rr_current_cpu, cpu);
            current_cpu = cpu;
            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
            if (cpu_can_run(cpu)) {
                int r;
                qemu_mutex_unlock_iothread();
                if (icount_enabled()) {
                    icount_prepare_for_run(cpu);
                }
                r = tcg_cpus_exec(cpu);
                if (icount_enabled()) {
                    icount_process_data(cpu);
                }
                qemu_mutex_lock_iothread();
                // TODO(libafl) should we have the iothread lock on?
                if (r == EXCP_LIBAFL_BP) {
                    rr_stop_kick_timer();
                    return;
                }
                if (r == EXCP_DEBUG) {
                    cpu_handle_guest_debug(cpu);
                    break;
                } else if (r == EXCP_ATOMIC) {
                    qemu_mutex_unlock_iothread();
                    cpu_exec_step_atomic(cpu);
                    qemu_mutex_lock_iothread();
                    break;
                }
            } else if (cpu->stop) {
                if (cpu->unplug) {
                    cpu = CPU_NEXT(cpu);
                }
                break;
            }
            cpu = CPU_NEXT(cpu);
        } /* while (cpu && !cpu->exit_request).. */
        /* Does not need qatomic_mb_set because a spurious wakeup is okay.  */
        qatomic_set(&rr_current_cpu, NULL);
        if (cpu && cpu->exit_request) {
            qatomic_mb_set(&cpu->exit_request, 0);
        }
        if (icount_enabled() && all_cpu_threads_idle()) {
            /*
             * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
             * in the main_loop, wake it up in order to start the warp timer.
             */
            qemu_notify_event();
        }
        rr_wait_io_event();
        rr_deal_with_unplugged_cpus();
    }
 }
 //// --- End LibAFL code ---
 /*
 * In the single-threaded case each vCPU is simulated in turn. If
 * there is more than a single vCPU we create a simple timer to kick
@ -152,7 +257,9 @@ static void *rr_cpu_thread_fn(void *arg)
    Notifier force_rcu;
    CPUState *cpu = arg;
-    assert(tcg_enabled());
+    g_assert(tcg_enabled());
    tcg_cpu_init_cflags(cpu, false);
    rcu_register_thread();
    force_rcu.notify = rr_force_rcu;
    rcu_add_force_rcu_notifier(&force_rcu);
@ -177,13 +284,25 @@ static void *rr_cpu_thread_fn(void *arg)
        }
    }
-    rr_start_kick_timer();
+    // rr_start_kick_timer();
    cpu = first_cpu;
    /* process any pending work */
    cpu->exit_request = 1;
    //// --- Begin LibAFL code ---
    libafl_start_vcpu(cpu);
    rcu_remove_force_rcu_notifier(&force_rcu);
    rcu_unregister_thread();
    return NULL;
    //// --- End LibAFL code ---
    // The following dead code is from the original QEMU codebase
    while (1) {
        qemu_mutex_unlock_iothread();
        replay_mutex_lock();
@ -275,9 +394,6 @@ void rr_start_vcpu_thread(CPUState *cpu)
    static QemuCond *single_tcg_halt_cond;
    static QemuThread *single_tcg_cpu_thread;
    g_assert(tcg_enabled());
    tcg_cpu_init_cflags(cpu, false);
    if (!single_tcg_cpu_thread) {
        cpu->thread = g_new0(QemuThread, 1);
        cpu->halt_cond = g_new0(QemuCond, 1);
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@ -97,16 +97,17 @@ static void tcg_accel_ops_init(AccelOpsClass *ops)
        ops->create_vcpu_thread = mttcg_start_vcpu_thread;
        ops->kick_vcpu_thread = mttcg_kick_vcpu_thread;
        ops->handle_interrupt = tcg_handle_interrupt;
    } else if (icount_enabled()) {
        ops->create_vcpu_thread = rr_start_vcpu_thread;
        ops->kick_vcpu_thread = rr_kick_vcpu_thread;
        ops->handle_interrupt = icount_handle_interrupt;
        ops->get_virtual_clock = icount_get;
        ops->get_elapsed_ticks = icount_get;
    } else {
        ops->create_vcpu_thread = rr_start_vcpu_thread;
        ops->kick_vcpu_thread = rr_kick_vcpu_thread;
-        ops->handle_interrupt = tcg_handle_interrupt;
+
        if (icount_enabled()) {
            ops->handle_interrupt = icount_handle_interrupt;
            ops->get_virtual_clock = icount_get;
            ops->get_elapsed_ticks = icount_get;
        } else {
            ops->handle_interrupt = tcg_handle_interrupt;
        }
    }
 }
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@ -82,6 +82,14 @@ static bool check_tcg_memory_orders_compatible(void)
 static bool default_mttcg_enabled(void)
 {
 //// --- Begin LibAFL code ---
    // Only the RR ops works with libafl_qemu, so avoid MTTCG by default
    return false;
 //// --- End LibAFL code ---
    if (icount_enabled() || TCG_OVERSIZED_GUEST) {
        return false;
    } else {
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@ -33,13 +33,71 @@
 //// --- Begin LibAFL code ---
 #ifndef CONFIG_USER_ONLY
 #include "sysemu/runstate.h"
 #include "migration/snapshot.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 void libafl_save_qemu_snapshot(char *name);
 void libafl_load_qemu_snapshot(char *name);
 static void save_snapshot_cb(void* opaque)
 {
    char* name = (char*)opaque;
    Error *err = NULL;
    if(!save_snapshot(name, true, NULL, false, NULL, &err)) {
        error_report_err(err);
        error_report("Could not save snapshot");
    }
 }
 void libafl_save_qemu_snapshot(char *name)
 {
    aio_bh_schedule_oneshot_full(qemu_get_aio_context(), save_snapshot_cb, (void*)name, "save_snapshot");
 }
 static void load_snapshot_cb(void* opaque)
 {
    char* name = (char*)opaque;
    Error *err = NULL;
    int saved_vm_running = runstate_is_running();
    vm_stop(RUN_STATE_RESTORE_VM);
    bool loaded = load_snapshot(name, NULL, false, NULL, &err);
    if(!loaded) {
        error_report_err(err);
        error_report("Could not load snapshot");
    }
    if (loaded && saved_vm_running) {
        vm_start();
    }
 }
 void libafl_load_qemu_snapshot(char *name)
 {
    aio_bh_schedule_oneshot_full(qemu_get_aio_context(), load_snapshot_cb, (void*)name, "load_snapshot");
 }
 #endif
 #define EXCP_LIBAFL_BP 0xf4775747
 void libafl_qemu_trigger_breakpoint(CPUState* cpu);
 void libafl_qemu_trigger_breakpoint(CPUState* cpu)
 {
    cpu->exception_index = EXCP_LIBAFL_BP;
    cpu_loop_exit(cpu);
 }
 void HELPER(libafl_qemu_handle_breakpoint)(CPUArchState *env)
 {
-    CPUState* cpu = env_cpu(env);
+    libafl_qemu_trigger_breakpoint(env_cpu(env));
    cpu->exception_index = EXCP_LIBAFL_BP;
    cpu_loop_exit(cpu);
 }
 //// --- End LibAFL code ---
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@ -79,6 +79,7 @@ void libafl_gen_read_N(TCGv addr, size_t size);
 void libafl_gen_write(TCGv addr, MemOp ot);
 void libafl_gen_write_N(TCGv addr, size_t size);
 void libafl_gen_cmp(target_ulong pc, TCGv op0, TCGv op1, MemOp ot);
 void libafl_gen_backdoor(target_ulong pc);
 static TCGHelperInfo libafl_exec_edge_hook_info = {
    .func = NULL, .name = "libafl_exec_edge_hook", \
@ -656,6 +657,37 @@ void libafl_gen_cmp(target_ulong pc, TCGv op0, TCGv op1, MemOp ot)
    }
 }
 static TCGHelperInfo libafl_exec_backdoor_hook_info = {
    .func = NULL, .name = "libafl_exec_backdoor_hook", \
    .flags = dh_callflag(void), \
    .typemask = dh_typemask(void, 0) | dh_typemask(tl, 1) | dh_typemask(i64, 2)
 };
 struct libafl_backdoor_hook {
    void (*exec)(target_ulong pc, uint64_t data);
    uint64_t data;
    TCGHelperInfo helper_info;
    struct libafl_backdoor_hook* next;
 };
 struct libafl_backdoor_hook* libafl_backdoor_hooks;
 void libafl_add_backdoor_hook(void (*exec)(target_ulong pc, uint64_t data),
                              uint64_t data);
 void libafl_add_backdoor_hook(void (*exec)(uint64_t id, uint64_t data),
                              uint64_t data)
 {
    struct libafl_backdoor_hook* hook = malloc(sizeof(struct libafl_backdoor_hook));
    hook->exec = exec;
    hook->data = data;
    hook->next = libafl_backdoor_hooks;
    libafl_backdoor_hooks = hook;
    memcpy(&hook->helper_info, &libafl_exec_backdoor_hook_info, sizeof(TCGHelperInfo));
    hook->helper_info.func = exec;
    libafl_helper_table_add(&hook->helper_info);
 }
 //// --- End LibAFL code ---
 /* #define DEBUG_TB_INVALIDATE */
@ -3063,6 +3095,15 @@ int page_get_flags(target_ulong address)
    return p->flags;
 }
 /*
 * Allow the target to decide if PAGE_TARGET_[12] may be reset.
 * By default, they are not kept.
 */
 #ifndef PAGE_TARGET_STICKY
 #define PAGE_TARGET_STICKY  0
 #endif
 #define PAGE_STICKY  (PAGE_ANON | PAGE_TARGET_STICKY)
 /* Modify the flags of a page and invalidate the code if necessary.
   The flag PAGE_WRITE_ORG is positioned automatically depending
   on PAGE_WRITE.  The mmap_lock should already be held.  */
@ -3106,8 +3147,8 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
            p->target_data = NULL;
            p->flags = flags;
        } else {
-            /* Using mprotect on a page does not change MAP_ANON. */
+            /* Using mprotect on a page does not change sticky bits. */
-            p->flags = (p->flags & PAGE_ANON) | flags;
+            p->flags = (p->flags & PAGE_STICKY) | flags;
        }
    }
 }
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@ -43,6 +43,15 @@ extern struct libafl_hook* libafl_qemu_hooks;
 struct libafl_hook* libafl_search_hook(target_ulong addr);
 struct libafl_backdoor_hook {
    void (*exec)(target_ulong pc, uint64_t data);
    uint64_t data;
    TCGHelperInfo helper_info;
    struct libafl_backdoor_hook* next;
 };
 extern struct libafl_backdoor_hook* libafl_backdoor_hooks;
 //// --- End LibAFL code ---
 /* Pairs with tcg_clear_temp_count.
@ -146,6 +155,41 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
        libafl_gen_cur_pc = db->pc_next;
        // 0x0f, 0x3a, 0xf2, 0x44
        uint8_t backdoor = translator_ldub(cpu->env_ptr, db, db->pc_next);
        if (backdoor == 0x0f) {
            backdoor = translator_ldub(cpu->env_ptr, db, db->pc_next +1);
            if (backdoor == 0x3a) {
                backdoor = translator_ldub(cpu->env_ptr, db, db->pc_next +2);
                if (backdoor == 0xf2) {
                    backdoor = translator_ldub(cpu->env_ptr, db, db->pc_next +3);
                    if (backdoor == 0x44) {
                        struct libafl_backdoor_hook* hk = libafl_backdoor_hooks;
                        while (hk) {
                            TCGv tmp0 = tcg_const_tl(db->pc_next);
                            TCGv_i64 tmp1 = tcg_const_i64(hk->data);
 #if TARGET_LONG_BITS == 32
                            TCGTemp *tmp2[2] = { tcgv_i32_temp(tmp0), tcgv_i64_temp(tmp1) };
 #else
                            TCGTemp *tmp2[2] = { tcgv_i64_temp(tmp0), tcgv_i64_temp(tmp1) };
 #endif
                            tcg_gen_callN(hk->exec, NULL, 2, tmp2);
 #if TARGET_LONG_BITS == 32
                            tcg_temp_free_i32(tmp0);
 #else
                            tcg_temp_free_i64(tmp0);
 #endif
                            tcg_temp_free_i64(tmp1);
                            hk = hk->next;
                        }
                        db->pc_next += 4;
                        goto post_translate_insn;
                    }
                }
            }
        }
        //// --- End LibAFL code ---
        /* Disassemble one instruction.  The translate_insn hook should
@ -162,6 +206,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
            ops->translate_insn(db, cpu);
        }
 post_translate_insn:
        /* Stop translation if translate_insn so indicated.  */
        if (db->is_jmp != DISAS_NEXT) {
            break;
--- a/audio/meson.build
+++ b/audio/meson.build
@ -28,7 +28,7 @@ endforeach
 if dbus_display
    module_ss = ss.source_set()
-    module_ss.add(when: [gio, pixman, opengl, 'CONFIG_GIO'], if_true: files('dbusaudio.c'))
+    module_ss.add(when: gio, if_true: files('dbusaudio.c'))
    audio_modules += {'dbus': module_ss}
 endif
--- a/backends/cryptodev-builtin.c
+++ b/backends/cryptodev-builtin.c
@ -26,6 +26,7 @@
 #include "qapi/error.h"
 #include "standard-headers/linux/virtio_crypto.h"
 #include "crypto/cipher.h"
 #include "crypto/akcipher.h"
 #include "qom/object.h"
@ -42,10 +43,11 @@ typedef struct CryptoDevBackendBuiltinSession {
    QCryptoCipher *cipher;
    uint8_t direction; /* encryption or decryption */
    uint8_t type; /* cipher? hash? aead? */
    QCryptoAkCipher *akcipher;
    QTAILQ_ENTRY(CryptoDevBackendBuiltinSession) next;
 } CryptoDevBackendBuiltinSession;
-/* Max number of symmetric sessions */
+/* Max number of symmetric/asymmetric sessions */
 #define MAX_NUM_SESSIONS 256
 #define CRYPTODEV_BUITLIN_MAX_AUTH_KEY_LEN    512
@ -80,15 +82,17 @@ static void cryptodev_builtin_init(
    backend->conf.crypto_services =
                         1u << VIRTIO_CRYPTO_SERVICE_CIPHER |
                         1u << VIRTIO_CRYPTO_SERVICE_HASH |
-                         1u << VIRTIO_CRYPTO_SERVICE_MAC;
+                         1u << VIRTIO_CRYPTO_SERVICE_MAC |
                         1u << VIRTIO_CRYPTO_SERVICE_AKCIPHER;
    backend->conf.cipher_algo_l = 1u << VIRTIO_CRYPTO_CIPHER_AES_CBC;
    backend->conf.hash_algo = 1u << VIRTIO_CRYPTO_HASH_SHA1;
    backend->conf.akcipher_algo = 1u << VIRTIO_CRYPTO_AKCIPHER_RSA;
    /*
     * Set the Maximum length of crypto request.
     * Why this value? Just avoid to overflow when
     * memory allocation for each crypto request.
     */
-    backend->conf.max_size = LONG_MAX - sizeof(CryptoDevBackendSymOpInfo);
+    backend->conf.max_size = LONG_MAX - sizeof(CryptoDevBackendOpInfo);
    backend->conf.max_cipher_key_len = CRYPTODEV_BUITLIN_MAX_CIPHER_KEY_LEN;
    backend->conf.max_auth_key_len = CRYPTODEV_BUITLIN_MAX_AUTH_KEY_LEN;
@ -148,6 +152,55 @@ err:
   return -1;
 }
 static int cryptodev_builtin_get_rsa_hash_algo(
    int virtio_rsa_hash, Error **errp)
 {
    switch (virtio_rsa_hash) {
    case VIRTIO_CRYPTO_RSA_MD5:
        return QCRYPTO_HASH_ALG_MD5;
    case VIRTIO_CRYPTO_RSA_SHA1:
        return QCRYPTO_HASH_ALG_SHA1;
    case VIRTIO_CRYPTO_RSA_SHA256:
        return QCRYPTO_HASH_ALG_SHA256;
    case VIRTIO_CRYPTO_RSA_SHA512:
        return QCRYPTO_HASH_ALG_SHA512;
    default:
        error_setg(errp, "Unsupported rsa hash algo: %d", virtio_rsa_hash);
        return -1;
    }
 }
 static int cryptodev_builtin_set_rsa_options(
                    int virtio_padding_algo,
                    int virtio_hash_algo,
                    QCryptoAkCipherOptionsRSA *opt,
                    Error **errp)
 {
    if (virtio_padding_algo == VIRTIO_CRYPTO_RSA_PKCS1_PADDING) {
        int hash_alg;
        hash_alg = cryptodev_builtin_get_rsa_hash_algo(virtio_hash_algo, errp);
        if (hash_alg < 0) {
            return -1;
        }
        opt->hash_alg = hash_alg;
        opt->padding_alg = QCRYPTO_RSA_PADDING_ALG_PKCS1;
        return 0;
    }
    if (virtio_padding_algo == VIRTIO_CRYPTO_RSA_RAW_PADDING) {
        opt->padding_alg = QCRYPTO_RSA_PADDING_ALG_RAW;
        return 0;
    }
    error_setg(errp, "Unsupported rsa padding algo: %d", virtio_padding_algo);
    return -1;
 }
 static int cryptodev_builtin_create_cipher_session(
                    CryptoDevBackendBuiltin *builtin,
                    CryptoDevBackendSymSessionInfo *sess_info,
@ -240,26 +293,89 @@ static int cryptodev_builtin_create_cipher_session(
    return index;
 }
-static int64_t cryptodev_builtin_sym_create_session(
+static int cryptodev_builtin_create_akcipher_session(
                    CryptoDevBackendBuiltin *builtin,
                    CryptoDevBackendAsymSessionInfo *sess_info,
                    Error **errp)
 {
    CryptoDevBackendBuiltinSession *sess;
    QCryptoAkCipher *akcipher;
    int index;
    QCryptoAkCipherKeyType type;
    QCryptoAkCipherOptions opts;
    switch (sess_info->algo) {
    case VIRTIO_CRYPTO_AKCIPHER_RSA:
        opts.alg = QCRYPTO_AKCIPHER_ALG_RSA;
        if (cryptodev_builtin_set_rsa_options(sess_info->u.rsa.padding_algo,
            sess_info->u.rsa.hash_algo, &opts.u.rsa, errp) != 0) {
            return -1;
        }
        break;
    /* TODO support DSA&ECDSA until qemu crypto framework support these */
    default:
        error_setg(errp, "Unsupported akcipher alg %u", sess_info->algo);
        return -1;
    }
    switch (sess_info->keytype) {
    case VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PUBLIC:
        type = QCRYPTO_AKCIPHER_KEY_TYPE_PUBLIC;
        break;
    case VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PRIVATE:
        type = QCRYPTO_AKCIPHER_KEY_TYPE_PRIVATE;
        break;
    default:
        error_setg(errp, "Unsupported akcipher keytype %u", sess_info->keytype);
        return -1;
    }
    index = cryptodev_builtin_get_unused_session_index(builtin);
    if (index < 0) {
        error_setg(errp, "Total number of sessions created exceeds %u",
                   MAX_NUM_SESSIONS);
        return -1;
    }
    akcipher = qcrypto_akcipher_new(&opts, type, sess_info->key,
                                    sess_info->keylen, errp);
    if (!akcipher) {
        return -1;
    }
    sess = g_new0(CryptoDevBackendBuiltinSession, 1);
    sess->akcipher = akcipher;
    builtin->sessions[index] = sess;
    return index;
 }
 static int64_t cryptodev_builtin_create_session(
           CryptoDevBackend *backend,
-           CryptoDevBackendSymSessionInfo *sess_info,
+           CryptoDevBackendSessionInfo *sess_info,
           uint32_t queue_index, Error **errp)
 {
    CryptoDevBackendBuiltin *builtin =
                      CRYPTODEV_BACKEND_BUILTIN(backend);
-    int64_t session_id = -1;
+    CryptoDevBackendSymSessionInfo *sym_sess_info;
-    int ret;
+    CryptoDevBackendAsymSessionInfo *asym_sess_info;
    switch (sess_info->op_code) {
    case VIRTIO_CRYPTO_CIPHER_CREATE_SESSION:
-        ret = cryptodev_builtin_create_cipher_session(
+        sym_sess_info = &sess_info->u.sym_sess_info;
-                           builtin, sess_info, errp);
+        return cryptodev_builtin_create_cipher_session(
-        if (ret < 0) {
+                           builtin, sym_sess_info, errp);
-            return ret;
+
-        } else {
+    case VIRTIO_CRYPTO_AKCIPHER_CREATE_SESSION:
-            session_id = ret;
+        asym_sess_info = &sess_info->u.asym_sess_info;
-        }
+        return cryptodev_builtin_create_akcipher_session(
-        break;
+                           builtin, asym_sess_info, errp);
    case VIRTIO_CRYPTO_HASH_CREATE_SESSION:
    case VIRTIO_CRYPTO_MAC_CREATE_SESSION:
    default:
@ -268,50 +384,44 @@ static int64_t cryptodev_builtin_sym_create_session(
        return -1;
    }
-    return session_id;
+    return -1;
 }
-static int cryptodev_builtin_sym_close_session(
+static int cryptodev_builtin_close_session(
           CryptoDevBackend *backend,
           uint64_t session_id,
           uint32_t queue_index, Error **errp)
 {
    CryptoDevBackendBuiltin *builtin =
                      CRYPTODEV_BACKEND_BUILTIN(backend);
    CryptoDevBackendBuiltinSession *session;
    assert(session_id < MAX_NUM_SESSIONS && builtin->sessions[session_id]);
-    qcrypto_cipher_free(builtin->sessions[session_id]->cipher);
+    session = builtin->sessions[session_id];
-    g_free(builtin->sessions[session_id]);
+    if (session->cipher) {
        qcrypto_cipher_free(session->cipher);
    } else if (session->akcipher) {
        qcrypto_akcipher_free(session->akcipher);
    }
    g_free(session);
    builtin->sessions[session_id] = NULL;
    return 0;
 }
 static int cryptodev_builtin_sym_operation(
-                 CryptoDevBackend *backend,
+                 CryptoDevBackendBuiltinSession *sess,
-                 CryptoDevBackendSymOpInfo *op_info,
+                 CryptoDevBackendSymOpInfo *op_info, Error **errp)
                 uint32_t queue_index, Error **errp)
 {
    CryptoDevBackendBuiltin *builtin =
                      CRYPTODEV_BACKEND_BUILTIN(backend);
    CryptoDevBackendBuiltinSession *sess;
    int ret;
    if (op_info->session_id >= MAX_NUM_SESSIONS ||
              builtin->sessions[op_info->session_id] == NULL) {
        error_setg(errp, "Cannot find a valid session id: %" PRIu64 "",
                   op_info->session_id);
        return -VIRTIO_CRYPTO_INVSESS;
    }
    if (op_info->op_type == VIRTIO_CRYPTO_SYM_OP_ALGORITHM_CHAINING) {
        error_setg(errp,
               "Algorithm chain is unsupported for cryptdoev-builtin");
        return -VIRTIO_CRYPTO_NOTSUPP;
    }
    sess = builtin->sessions[op_info->session_id];
    if (op_info->iv_len > 0) {
        ret = qcrypto_cipher_setiv(sess->cipher, op_info->iv,
                                   op_info->iv_len, errp);
@ -333,9 +443,99 @@ static int cryptodev_builtin_sym_operation(
            return -VIRTIO_CRYPTO_ERR;
        }
    }
    return VIRTIO_CRYPTO_OK;
 }
 static int cryptodev_builtin_asym_operation(
                 CryptoDevBackendBuiltinSession *sess, uint32_t op_code,
                 CryptoDevBackendAsymOpInfo *op_info, Error **errp)
 {
    int ret;
    switch (op_code) {
    case VIRTIO_CRYPTO_AKCIPHER_ENCRYPT:
        ret = qcrypto_akcipher_encrypt(sess->akcipher,
                                       op_info->src, op_info->src_len,
                                       op_info->dst, op_info->dst_len, errp);
        break;
    case VIRTIO_CRYPTO_AKCIPHER_DECRYPT:
        ret = qcrypto_akcipher_decrypt(sess->akcipher,
                                       op_info->src, op_info->src_len,
                                       op_info->dst, op_info->dst_len, errp);
        break;
    case VIRTIO_CRYPTO_AKCIPHER_SIGN:
        ret = qcrypto_akcipher_sign(sess->akcipher,
                                    op_info->src, op_info->src_len,
                                    op_info->dst, op_info->dst_len, errp);
        break;
    case VIRTIO_CRYPTO_AKCIPHER_VERIFY:
        ret = qcrypto_akcipher_verify(sess->akcipher,
                                      op_info->src, op_info->src_len,
                                      op_info->dst, op_info->dst_len, errp);
        break;
    default:
        return -VIRTIO_CRYPTO_ERR;
    }
    if (ret < 0) {
        if (op_code == VIRTIO_CRYPTO_AKCIPHER_VERIFY) {
            return -VIRTIO_CRYPTO_KEY_REJECTED;
        }
        return -VIRTIO_CRYPTO_ERR;
    }
    /* Buffer is too short, typically the driver should handle this case */
    if (unlikely(ret > op_info->dst_len)) {
        if (errp && !*errp) {
            error_setg(errp, "dst buffer too short");
        }
        return -VIRTIO_CRYPTO_ERR;
    }
    op_info->dst_len = ret;
    return VIRTIO_CRYPTO_OK;
 }
 static int cryptodev_builtin_operation(
                 CryptoDevBackend *backend,
                 CryptoDevBackendOpInfo *op_info,
                 uint32_t queue_index, Error **errp)
 {
    CryptoDevBackendBuiltin *builtin =
                      CRYPTODEV_BACKEND_BUILTIN(backend);
    CryptoDevBackendBuiltinSession *sess;
    CryptoDevBackendSymOpInfo *sym_op_info;
    CryptoDevBackendAsymOpInfo *asym_op_info;
    enum CryptoDevBackendAlgType algtype = op_info->algtype;
    int ret = -VIRTIO_CRYPTO_ERR;
    if (op_info->session_id >= MAX_NUM_SESSIONS ||
              builtin->sessions[op_info->session_id] == NULL) {
        error_setg(errp, "Cannot find a valid session id: %" PRIu64 "",
                   op_info->session_id);
        return -VIRTIO_CRYPTO_INVSESS;
    }
    sess = builtin->sessions[op_info->session_id];
    if (algtype == CRYPTODEV_BACKEND_ALG_SYM) {
        sym_op_info = op_info->u.sym_op_info;
        ret = cryptodev_builtin_sym_operation(sess, sym_op_info, errp);
    } else if (algtype == CRYPTODEV_BACKEND_ALG_ASYM) {
        asym_op_info = op_info->u.asym_op_info;
        ret = cryptodev_builtin_asym_operation(sess, op_info->op_code,
                                               asym_op_info, errp);
    }
    return ret;
 }
 static void cryptodev_builtin_cleanup(
             CryptoDevBackend *backend,
             Error **errp)
@ -348,7 +548,7 @@ static void cryptodev_builtin_cleanup(
    for (i = 0; i < MAX_NUM_SESSIONS; i++) {
        if (builtin->sessions[i] != NULL) {
-            cryptodev_builtin_sym_close_session(backend, i, 0, &error_abort);
+            cryptodev_builtin_close_session(backend, i, 0, &error_abort);
        }
    }
@ -370,9 +570,9 @@ cryptodev_builtin_class_init(ObjectClass *oc, void *data)
    bc->init = cryptodev_builtin_init;
    bc->cleanup = cryptodev_builtin_cleanup;
-    bc->create_session = cryptodev_builtin_sym_create_session;
+    bc->create_session = cryptodev_builtin_create_session;
-    bc->close_session = cryptodev_builtin_sym_close_session;
+    bc->close_session = cryptodev_builtin_close_session;
-    bc->do_sym_op = cryptodev_builtin_sym_operation;
+    bc->do_op = cryptodev_builtin_operation;
 }
 static const TypeInfo cryptodev_builtin_info = {
--- a/backends/cryptodev-vhost-user.c
+++ b/backends/cryptodev-vhost-user.c
@ -259,7 +259,33 @@ static int64_t cryptodev_vhost_user_sym_create_session(
    return -1;
 }
-static int cryptodev_vhost_user_sym_close_session(
+static int64_t cryptodev_vhost_user_create_session(
           CryptoDevBackend *backend,
           CryptoDevBackendSessionInfo *sess_info,
           uint32_t queue_index, Error **errp)
 {
    uint32_t op_code = sess_info->op_code;
    CryptoDevBackendSymSessionInfo *sym_sess_info;
    switch (op_code) {
    case VIRTIO_CRYPTO_CIPHER_CREATE_SESSION:
    case VIRTIO_CRYPTO_HASH_CREATE_SESSION:
    case VIRTIO_CRYPTO_MAC_CREATE_SESSION:
    case VIRTIO_CRYPTO_AEAD_CREATE_SESSION:
        sym_sess_info = &sess_info->u.sym_sess_info;
        return cryptodev_vhost_user_sym_create_session(backend, sym_sess_info,
                   queue_index, errp);
    default:
        error_setg(errp, "Unsupported opcode :%" PRIu32 "",
                   sess_info->op_code);
        return -1;
    }
    return -1;
 }
 static int cryptodev_vhost_user_close_session(
           CryptoDevBackend *backend,
           uint64_t session_id,
           uint32_t queue_index, Error **errp)
@ -351,9 +377,9 @@ cryptodev_vhost_user_class_init(ObjectClass *oc, void *data)
    bc->init = cryptodev_vhost_user_init;
    bc->cleanup = cryptodev_vhost_user_cleanup;
-    bc->create_session = cryptodev_vhost_user_sym_create_session;
+    bc->create_session = cryptodev_vhost_user_create_session;
-    bc->close_session = cryptodev_vhost_user_sym_close_session;
+    bc->close_session = cryptodev_vhost_user_close_session;
-    bc->do_sym_op = NULL;
+    bc->do_op = NULL;
    object_class_property_add_str(oc, "chardev",
                                  cryptodev_vhost_user_get_chardev,
--- a/backends/cryptodev.c
+++ b/backends/cryptodev.c
@ -72,9 +72,9 @@ void cryptodev_backend_cleanup(
    }
 }
-int64_t cryptodev_backend_sym_create_session(
+int64_t cryptodev_backend_create_session(
           CryptoDevBackend *backend,
-           CryptoDevBackendSymSessionInfo *sess_info,
+           CryptoDevBackendSessionInfo *sess_info,
           uint32_t queue_index, Error **errp)
 {
    CryptoDevBackendClass *bc =
@ -87,7 +87,7 @@ int64_t cryptodev_backend_sym_create_session(
    return -1;
 }
-int cryptodev_backend_sym_close_session(
+int cryptodev_backend_close_session(
           CryptoDevBackend *backend,
           uint64_t session_id,
           uint32_t queue_index, Error **errp)
@ -102,16 +102,16 @@ int cryptodev_backend_sym_close_session(
    return -1;
 }
-static int cryptodev_backend_sym_operation(
+static int cryptodev_backend_operation(
                 CryptoDevBackend *backend,
-                 CryptoDevBackendSymOpInfo *op_info,
+                 CryptoDevBackendOpInfo *op_info,
                 uint32_t queue_index, Error **errp)
 {
    CryptoDevBackendClass *bc =
                      CRYPTODEV_BACKEND_GET_CLASS(backend);
-    if (bc->do_sym_op) {
+    if (bc->do_op) {
-        return bc->do_sym_op(backend, op_info, queue_index, errp);
+        return bc->do_op(backend, op_info, queue_index, errp);
    }
    return -VIRTIO_CRYPTO_ERR;
@ -123,20 +123,18 @@ int cryptodev_backend_crypto_operation(
                 uint32_t queue_index, Error **errp)
 {
    VirtIOCryptoReq *req = opaque;
    CryptoDevBackendOpInfo *op_info = &req->op_info;
    enum CryptoDevBackendAlgType algtype = req->flags;
-    if (req->flags == CRYPTODEV_BACKEND_ALG_SYM) {
+    if ((algtype != CRYPTODEV_BACKEND_ALG_SYM)
-        CryptoDevBackendSymOpInfo *op_info;
+        && (algtype != CRYPTODEV_BACKEND_ALG_ASYM)) {
        op_info = req->u.sym_op_info;
        return cryptodev_backend_sym_operation(backend,
                         op_info, queue_index, errp);
    } else {
        error_setg(errp, "Unsupported cryptodev alg type: %" PRIu32 "",
-                   req->flags);
+                   algtype);
-       return -VIRTIO_CRYPTO_NOTSUPP;
+
        return -VIRTIO_CRYPTO_NOTSUPP;
    }
-    return -VIRTIO_CRYPTO_ERR;
+    return cryptodev_backend_operation(backend, op_info, queue_index, errp);
 }
 static void
--- a/block.c
+++ b/block.c
@ -1037,7 +1037,7 @@ static int find_image_format(BlockBackend *file, const char *filename,
        return ret;
    }
-    ret = blk_pread(file, 0, buf, sizeof(buf));
+    ret = blk_pread(file, 0, sizeof(buf), buf, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not read image for determining its "
                         "format");
@ -1045,14 +1045,16 @@ static int find_image_format(BlockBackend *file, const char *filename,
        return ret;
    }
-    drv = bdrv_probe_all(buf, ret, filename);
+    drv = bdrv_probe_all(buf, sizeof(buf), filename);
    if (!drv) {
        error_setg(errp, "Could not determine image format: No compatible "
                   "driver found");
-        ret = -ENOENT;
+        *pdrv = NULL;
        return -ENOENT;
    }
    *pdrv = drv;
-    return ret;
+    return 0;
 }
 /**
--- a/block/backup.c
+++ b/block/backup.c
@ -228,15 +228,13 @@ out:
 static void backup_init_bcs_bitmap(BackupBlockJob *job)
 {
    bool ret;
    uint64_t estimate;
    BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(job->bcs);
    if (job->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
        bdrv_clear_dirty_bitmap(bcs_bitmap, NULL);
-        ret = bdrv_dirty_bitmap_merge_internal(bcs_bitmap, job->sync_bitmap,
+        bdrv_dirty_bitmap_merge_internal(bcs_bitmap, job->sync_bitmap, NULL,
-                                               NULL, true);
+                                         true);
        assert(ret);
    } else if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
        /*
         * We can't hog the coroutine to initialize this thoroughly.
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@ -107,8 +107,8 @@ static uint64_t blk_log_writes_find_cur_log_sector(BdrvChild *log,
    struct log_write_entry cur_entry;
    while (cur_idx < nr_entries) {
-        int read_ret = bdrv_pread(log, cur_sector << sector_bits, &cur_entry,
+        int read_ret = bdrv_pread(log, cur_sector << sector_bits,
-                                  sizeof(cur_entry));
+                                  sizeof(cur_entry), &cur_entry, 0);
        if (read_ret < 0) {
            error_setg_errno(errp, -read_ret,
                             "Failed to read log entry %"PRIu64, cur_idx);
@ -190,7 +190,7 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict *options, int flags,
            log_sb.nr_entries = cpu_to_le64(0);
            log_sb.sectorsize = cpu_to_le32(BDRV_SECTOR_SIZE);
        } else {
-            ret = bdrv_pread(s->log_file, 0, &log_sb, sizeof(log_sb));
+            ret = bdrv_pread(s->log_file, 0, sizeof(log_sb), &log_sb, 0);
            if (ret < 0) {
                error_setg_errno(errp, -ret, "Could not read log superblock");
                goto fail_log;
--- a/block/block-backend.c
+++ b/block/block-backend.c
@ -56,9 +56,6 @@ struct BlockBackend {
    const BlockDevOps *dev_ops;
    void *dev_opaque;
    /* the block size for which the guest device expects atomicity */
    int guest_block_size;
    /* If the BDS tree is removed, some of its options are stored here (which
     * can be used to restore those options in the new BDS on insert) */
    BlockBackendRootState root_state;
@ -998,7 +995,6 @@ void blk_detach_dev(BlockBackend *blk, DeviceState *dev)
    blk->dev = NULL;
    blk->dev_ops = NULL;
    blk->dev_opaque = NULL;
    blk->guest_block_size = 512;
    blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort);
    blk_unref(blk);
 }
@ -1062,7 +1058,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
    blk->dev_opaque = opaque;
    /* Are we currently quiesced? Should we enforce this right now? */
-    if (blk->quiesce_counter && ops->drained_begin) {
+    if (blk->quiesce_counter && ops && ops->drained_begin) {
        ops->drained_begin(opaque);
    }
 }
@ -1284,9 +1280,10 @@ static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
 }
 /* To be called between exactly one pair of blk_inc/dec_in_flight() */
-int coroutine_fn
+static int coroutine_fn
-blk_co_do_preadv(BlockBackend *blk, int64_t offset, int64_t bytes,
+blk_co_do_preadv_part(BlockBackend *blk, int64_t offset, int64_t bytes,
-                 QEMUIOVector *qiov, BdrvRequestFlags flags)
+                      QEMUIOVector *qiov, size_t qiov_offset,
                      BdrvRequestFlags flags)
 {
    int ret;
    BlockDriverState *bs;
@ -1311,11 +1308,23 @@ blk_co_do_preadv(BlockBackend *blk, int64_t offset, int64_t bytes,
                bytes, false);
    }
-    ret = bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
+    ret = bdrv_co_preadv_part(blk->root, offset, bytes, qiov, qiov_offset,
                              flags);
    bdrv_dec_in_flight(bs);
    return ret;
 }
 int coroutine_fn blk_co_pread(BlockBackend *blk, int64_t offset, int64_t bytes,
                              void *buf, BdrvRequestFlags flags)
 {
    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
    IO_OR_GS_CODE();
    assert(bytes <= SIZE_MAX);
    return blk_co_preadv(blk, offset, bytes, &qiov, flags);
 }
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
                               int64_t bytes, QEMUIOVector *qiov,
                               BdrvRequestFlags flags)
@ -1324,14 +1333,28 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
    IO_OR_GS_CODE();
    blk_inc_in_flight(blk);
-    ret = blk_co_do_preadv(blk, offset, bytes, qiov, flags);
+    ret = blk_co_do_preadv_part(blk, offset, bytes, qiov, 0, flags);
    blk_dec_in_flight(blk);
    return ret;
 }
 int coroutine_fn blk_co_preadv_part(BlockBackend *blk, int64_t offset,
                                    int64_t bytes, QEMUIOVector *qiov,
                                    size_t qiov_offset, BdrvRequestFlags flags)
 {
    int ret;
    IO_OR_GS_CODE();
    blk_inc_in_flight(blk);
    ret = blk_co_do_preadv_part(blk, offset, bytes, qiov, qiov_offset, flags);
    blk_dec_in_flight(blk);
    return ret;
 }
 /* To be called between exactly one pair of blk_inc/dec_in_flight() */
-int coroutine_fn
+static int coroutine_fn
 blk_co_do_pwritev_part(BlockBackend *blk, int64_t offset, int64_t bytes,
                       QEMUIOVector *qiov, size_t qiov_offset,
                       BdrvRequestFlags flags)
@ -1383,6 +1406,17 @@ int coroutine_fn blk_co_pwritev_part(BlockBackend *blk, int64_t offset,
    return ret;
 }
 int coroutine_fn blk_co_pwrite(BlockBackend *blk, int64_t offset, int64_t bytes,
                               const void *buf, BdrvRequestFlags flags)
 {
    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
    IO_OR_GS_CODE();
    assert(bytes <= SIZE_MAX);
    return blk_co_pwritev(blk, offset, bytes, &qiov, flags);
 }
 int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
                                int64_t bytes, QEMUIOVector *qiov,
                                BdrvRequestFlags flags)
@ -1391,20 +1425,6 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
    return blk_co_pwritev_part(blk, offset, bytes, qiov, 0, flags);
 }
 static int coroutine_fn blk_pwritev_part(BlockBackend *blk, int64_t offset,
                                         int64_t bytes,
                                         QEMUIOVector *qiov, size_t qiov_offset,
                                         BdrvRequestFlags flags)
 {
    int ret;
    blk_inc_in_flight(blk);
    ret = blk_do_pwritev_part(blk, offset, bytes, qiov, qiov_offset, flags);
    blk_dec_in_flight(blk);
    return ret;
 }
 typedef struct BlkRwCo {
    BlockBackend *blk;
    int64_t offset;
@ -1413,14 +1433,6 @@ typedef struct BlkRwCo {
    BdrvRequestFlags flags;
 } BlkRwCo;
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
                      int64_t bytes, BdrvRequestFlags flags)
 {
    IO_OR_GS_CODE();
    return blk_pwritev_part(blk, offset, bytes, NULL, 0,
                            flags | BDRV_REQ_ZERO_WRITE);
 }
 int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
 {
    GLOBAL_STATE_CODE();
@ -1541,8 +1553,8 @@ static void blk_aio_read_entry(void *opaque)
    QEMUIOVector *qiov = rwco->iobuf;
    assert(qiov->size == acb->bytes);
-    rwco->ret = blk_co_do_preadv(rwco->blk, rwco->offset, acb->bytes,
+    rwco->ret = blk_co_do_preadv_part(rwco->blk, rwco->offset, acb->bytes, qiov,
-                                 qiov, rwco->flags);
+                                      0, rwco->flags);
    blk_aio_complete(acb);
 }
@ -1567,31 +1579,6 @@ BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
                        flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
 }
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int bytes)
 {
    int ret;
    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
    IO_OR_GS_CODE();
    blk_inc_in_flight(blk);
    ret = blk_do_preadv(blk, offset, bytes, &qiov, 0);
    blk_dec_in_flight(blk);
    return ret < 0 ? ret : bytes;
 }
 int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int bytes,
               BdrvRequestFlags flags)
 {
    int ret;
    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
    IO_OR_GS_CODE();
    ret = blk_pwritev_part(blk, offset, bytes, &qiov, 0, flags);
    return ret < 0 ? ret : bytes;
 }
 int64_t blk_getlength(BlockBackend *blk)
 {
    IO_CODE();
@ -1655,7 +1642,7 @@ void blk_aio_cancel_async(BlockAIOCB *acb)
 }
 /* To be called between exactly one pair of blk_inc/dec_in_flight() */
-int coroutine_fn
+static int coroutine_fn
 blk_co_do_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
 {
    IO_CODE();
@ -1669,13 +1656,14 @@ blk_co_do_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
    return bdrv_co_ioctl(blk_bs(blk), req, buf);
 }
-int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
+int coroutine_fn blk_co_ioctl(BlockBackend *blk, unsigned long int req,
                              void *buf)
 {
    int ret;
    IO_OR_GS_CODE();
    blk_inc_in_flight(blk);
-    ret = blk_do_ioctl(blk, req, buf);
+    ret = blk_co_do_ioctl(blk, req, buf);
    blk_dec_in_flight(blk);
    return ret;
@ -1699,7 +1687,7 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
 }
 /* To be called between exactly one pair of blk_inc/dec_in_flight() */
-int coroutine_fn
+static int coroutine_fn
 blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes)
 {
    int ret;
@ -1746,20 +1734,8 @@ int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
    return ret;
 }
 int blk_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes)
 {
    int ret;
    IO_OR_GS_CODE();
    blk_inc_in_flight(blk);
    ret = blk_do_pdiscard(blk, offset, bytes);
    blk_dec_in_flight(blk);
    return ret;
 }
 /* To be called between exactly one pair of blk_inc/dec_in_flight() */
-int coroutine_fn blk_co_do_flush(BlockBackend *blk)
+static int coroutine_fn blk_co_do_flush(BlockBackend *blk)
 {
    blk_wait_while_drained(blk);
    IO_CODE();
@ -1799,17 +1775,6 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
    return ret;
 }
 int blk_flush(BlockBackend *blk)
 {
    int ret;
    blk_inc_in_flight(blk);
    ret = blk_do_flush(blk);
    blk_dec_in_flight(blk);
    return ret;
 }
 void blk_drain(BlockBackend *blk)
 {
    BlockDriverState *bs = blk_bs(blk);
@ -2100,12 +2065,6 @@ int blk_get_max_iov(BlockBackend *blk)
    return blk->root->bs->bl.max_iov;
 }
 void blk_set_guest_block_size(BlockBackend *blk, int align)
 {
    IO_CODE();
    blk->guest_block_size = align;
 }
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
    IO_CODE();
@ -2347,17 +2306,18 @@ int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
                          flags | BDRV_REQ_ZERO_WRITE);
 }
-int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf,
+int coroutine_fn blk_co_pwrite_compressed(BlockBackend *blk, int64_t offset,
-                          int64_t bytes)
+                                          int64_t bytes, const void *buf)
 {
    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
    IO_OR_GS_CODE();
-    return blk_pwritev_part(blk, offset, bytes, &qiov, 0,
+    return blk_co_pwritev_part(blk, offset, bytes, &qiov, 0,
-                            BDRV_REQ_WRITE_COMPRESSED);
+                               BDRV_REQ_WRITE_COMPRESSED);
 }
-int blk_truncate(BlockBackend *blk, int64_t offset, bool exact,
+int coroutine_fn blk_co_truncate(BlockBackend *blk, int64_t offset, bool exact,
-                 PreallocMode prealloc, BdrvRequestFlags flags, Error **errp)
+                                 PreallocMode prealloc, BdrvRequestFlags flags,
                                 Error **errp)
 {
    IO_OR_GS_CODE();
    if (!blk_is_available(blk)) {
@ -2365,7 +2325,7 @@ int blk_truncate(BlockBackend *blk, int64_t offset, bool exact,
        return -ENOMEDIUM;
    }
-    return bdrv_truncate(blk->root, offset, exact, prealloc, flags, errp);
+    return bdrv_co_truncate(blk->root, offset, exact, prealloc, flags, errp);
 }
 int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
--- a/block/block-copy.c
+++ b/block/block-copy.c
@ -883,23 +883,42 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
    return ret;
 }
-int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes,
+static void coroutine_fn block_copy_async_co_entry(void *opaque)
                            bool ignore_ratelimit)
 {
-    BlockCopyCallState call_state = {
+    block_copy_common(opaque);
 }
 int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes,
                            bool ignore_ratelimit, uint64_t timeout_ns,
                            BlockCopyAsyncCallbackFunc cb,
                            void *cb_opaque)
 {
    int ret;
    BlockCopyCallState *call_state = g_new(BlockCopyCallState, 1);
    *call_state = (BlockCopyCallState) {
        .s = s,
        .offset = start,
        .bytes = bytes,
        .ignore_ratelimit = ignore_ratelimit,
        .max_workers = BLOCK_COPY_MAX_WORKERS,
        .cb = cb,
        .cb_opaque = cb_opaque,
    };
-    return block_copy_common(&call_state);
+    ret = qemu_co_timeout(block_copy_async_co_entry, call_state, timeout_ns,
-}
+                          g_free);
    if (ret < 0) {
        assert(ret == -ETIMEDOUT);
        block_copy_call_cancel(call_state);
        /* call_state will be freed by running coroutine. */
        return ret;
    }
-static void coroutine_fn block_copy_async_co_entry(void *opaque)
+    ret = call_state->ret;
-{
+    g_free(call_state);
-    block_copy_common(opaque);
+
    return ret;
 }
 BlockCopyCallState *block_copy_async(BlockCopyState *s,
--- a/block/bochs.c
+++ b/block/bochs.c
@ -116,7 +116,7 @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
        return -EINVAL;
    }
-    ret = bdrv_pread(bs->file, 0, &bochs, sizeof(bochs));
+    ret = bdrv_pread(bs->file, 0, sizeof(bochs), &bochs, 0);
    if (ret < 0) {
        return ret;
    }
@ -150,8 +150,8 @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
        return -ENOMEM;
    }
-    ret = bdrv_pread(bs->file, le32_to_cpu(bochs.header), s->catalog_bitmap,
+    ret = bdrv_pread(bs->file, le32_to_cpu(bochs.header), s->catalog_size * 4,
-                     s->catalog_size * 4);
+                     s->catalog_bitmap, 0);
    if (ret < 0) {
        goto fail;
    }
@ -224,8 +224,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
        (s->extent_blocks + s->bitmap_blocks));
    /* read in bitmap for current extent */
-    ret = bdrv_pread(bs->file, bitmap_offset + (extent_offset / 8),
+    ret = bdrv_pread(bs->file, bitmap_offset + (extent_offset / 8), 1,
-                     &bitmap_entry, 1);
+                     &bitmap_entry, 0);
    if (ret < 0) {
        return ret;
    }
--- a/block/cloop.c
+++ b/block/cloop.c
@ -78,7 +78,7 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
    }
    /* read header */
-    ret = bdrv_pread(bs->file, 128, &s->block_size, 4);
+    ret = bdrv_pread(bs->file, 128, 4, &s->block_size, 0);
    if (ret < 0) {
        return ret;
    }
@ -104,7 +104,7 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
        return -EINVAL;
    }
-    ret = bdrv_pread(bs->file, 128 + 4, &s->n_blocks, 4);
+    ret = bdrv_pread(bs->file, 128 + 4, 4, &s->n_blocks, 0);
    if (ret < 0) {
        return ret;
    }
@ -135,7 +135,7 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
        return -ENOMEM;
    }
-    ret = bdrv_pread(bs->file, 128 + 4 + 4, s->offsets, offsets_size);
+    ret = bdrv_pread(bs->file, 128 + 4 + 4, offsets_size, s->offsets, 0);
    if (ret < 0) {
        goto fail;
    }
@ -220,9 +220,9 @@ static inline int cloop_read_block(BlockDriverState *bs, int block_num)
        int ret;
        uint32_t bytes = s->offsets[block_num + 1] - s->offsets[block_num];
-        ret = bdrv_pread(bs->file, s->offsets[block_num],
+        ret = bdrv_pread(bs->file, s->offsets[block_num], bytes,
-                         s->compressed_block, bytes);
+                         s->compressed_block, 0);
-        if (ret != bytes) {
+        if (ret < 0) {
            return -1;
        }
--- a/block/commit.c
+++ b/block/commit.c
@ -527,12 +527,12 @@ int bdrv_commit(BlockDriverState *bs)
            goto ro_cleanup;
        }
        if (ret) {
-            ret = blk_pread(src, offset, buf, n);
+            ret = blk_pread(src, offset, n, buf, 0);
            if (ret < 0) {
                goto ro_cleanup;
            }
-            ret = blk_pwrite(backing, offset, buf, n, 0);
+            ret = blk_pwrite(backing, offset, n, buf, 0);
            if (ret < 0) {
                goto ro_cleanup;
            }
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@ -24,6 +24,7 @@
 */
 #include "qemu/osdep.h"
 #include "qapi/qmp/qjson.h"
 #include "sysemu/block-backend.h"
 #include "qemu/cutils.h"
@ -40,6 +41,8 @@
 typedef struct BDRVCopyBeforeWriteState {
    BlockCopyState *bcs;
    BdrvChild *target;
    OnCbwError on_cbw_error;
    uint32_t cbw_timeout_ns;
    /*
     * @lock: protects access to @access_bitmap, @done_bitmap and
@ -64,6 +67,14 @@ typedef struct BDRVCopyBeforeWriteState {
     * node. These areas must not be rewritten by guest.
     */
    BlockReqList frozen_read_reqs;
    /*
     * @snapshot_error is normally zero. But on first copy-before-write failure
     * when @on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT, @snapshot_error takes
     * value of this error (<0). After that all in-flight and further
     * snapshot-API requests will fail with that error.
     */
    int snapshot_error;
 } BDRVCopyBeforeWriteState;
 static coroutine_fn int cbw_co_preadv(
@ -73,6 +84,13 @@ static coroutine_fn int cbw_co_preadv(
    return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
 }
 static void block_copy_cb(void *opaque)
 {
    BlockDriverState *bs = opaque;
    bdrv_dec_in_flight(bs);
 }
 /*
 * Do copy-before-write operation.
 *
@ -94,16 +112,36 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
        return 0;
    }
    if (s->snapshot_error) {
        return 0;
    }
    off = QEMU_ALIGN_DOWN(offset, cluster_size);
    end = QEMU_ALIGN_UP(offset + bytes, cluster_size);
-    ret = block_copy(s->bcs, off, end - off, true);
+    /*
-    if (ret < 0) {
+     * Increase in_flight, so that in case of timed-out block-copy, the
     * remaining background block_copy() request (which can't be immediately
     * cancelled by timeout) is presented in bs->in_flight. This way we are
     * sure that on bs close() we'll previously wait for all timed-out but yet
     * running block_copy calls.
     */
    bdrv_inc_in_flight(bs);
    ret = block_copy(s->bcs, off, end - off, true, s->cbw_timeout_ns,
                     block_copy_cb, bs);
    if (ret < 0 && s->on_cbw_error == ON_CBW_ERROR_BREAK_GUEST_WRITE) {
        return ret;
    }
    WITH_QEMU_LOCK_GUARD(&s->lock) {
-        bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
+        if (ret < 0) {
            assert(s->on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT);
            if (!s->snapshot_error) {
                s->snapshot_error = ret;
            }
        } else {
            bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
        }
        reqlist_wait_all(&s->frozen_read_reqs, off, end - off, &s->lock);
    }
@ -175,6 +213,11 @@ static BlockReq *cbw_snapshot_read_lock(BlockDriverState *bs,
    QEMU_LOCK_GUARD(&s->lock);
    if (s->snapshot_error) {
        g_free(req);
        return NULL;
    }
    if (bdrv_dirty_bitmap_next_zero(s->access_bitmap, offset, bytes) != -1) {
        g_free(req);
        return NULL;
@ -328,46 +371,36 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
    }
 }
-static bool cbw_parse_bitmap_option(QDict *options, BdrvDirtyBitmap **bitmap,
+static BlockdevOptions *cbw_parse_options(QDict *options, Error **errp)
                                    Error **errp)
 {
-    QDict *bitmap_qdict = NULL;
+    BlockdevOptions *opts = NULL;
    BlockDirtyBitmap *bmp_param = NULL;
    Visitor *v = NULL;
    bool ret = false;
-    *bitmap = NULL;
+    qdict_put_str(options, "driver", "copy-before-write");
-    qdict_extract_subqdict(options, &bitmap_qdict, "bitmap.");
+    v = qobject_input_visitor_new_flat_confused(options, errp);
    if (!qdict_size(bitmap_qdict)) {
        ret = true;
        goto out;
    }
    v = qobject_input_visitor_new_flat_confused(bitmap_qdict, errp);
    if (!v) {
        goto out;
    }
-    visit_type_BlockDirtyBitmap(v, NULL, &bmp_param, errp);
+    visit_type_BlockdevOptions(v, NULL, &opts, errp);
-    if (!bmp_param) {
+    if (!opts) {
        goto out;
    }
-    *bitmap = block_dirty_bitmap_lookup(bmp_param->node, bmp_param->name, NULL,
+    /*
-                                        errp);
+     * Delete options which we are going to parse through BlockdevOptions
-    if (!*bitmap) {
+     * object for original options.
-        goto out;
+     */
-    }
+    qdict_extract_subqdict(options, NULL, "bitmap");
-
+    qdict_del(options, "on-cbw-error");
-    ret = true;
+    qdict_del(options, "cbw-timeout");
 out:
    qapi_free_BlockDirtyBitmap(bmp_param);
    visit_free(v);
-    qobject_unref(bitmap_qdict);
+    qdict_del(options, "driver");
-    return ret;
+    return opts;
 }
 static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
@ -376,6 +409,15 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
    BDRVCopyBeforeWriteState *s = bs->opaque;
    BdrvDirtyBitmap *bitmap = NULL;
    int64_t cluster_size;
    g_autoptr(BlockdevOptions) full_opts = NULL;
    BlockdevOptionsCbw *opts;
    full_opts = cbw_parse_options(options, errp);
    if (!full_opts) {
        return -EINVAL;
    }
    assert(full_opts->driver == BLOCKDEV_DRIVER_COPY_BEFORE_WRITE);
    opts = &full_opts->u.copy_before_write;
    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@ -390,9 +432,17 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
        return -EINVAL;
    }
-    if (!cbw_parse_bitmap_option(options, &bitmap, errp)) {
+    if (opts->has_bitmap) {
-        return -EINVAL;
+        bitmap = block_dirty_bitmap_lookup(opts->bitmap->node,
                                           opts->bitmap->name, NULL, errp);
        if (!bitmap) {
            return -EINVAL;
        }
    }
    s->on_cbw_error = opts->has_on_cbw_error ? opts->on_cbw_error :
            ON_CBW_ERROR_BREAK_GUEST_WRITE;
    s->cbw_timeout_ns = opts->has_cbw_timeout ?
        opts->cbw_timeout * NANOSECONDS_PER_SECOND : 0;
    bs->total_sectors = bs->file->bs->total_sectors;
    bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
--- a/block/coroutines.h
+++ b/block/coroutines.h
@ -63,25 +63,6 @@ nbd_co_do_establish_connection(BlockDriverState *bs, bool blocking,
                               Error **errp);
 int coroutine_fn
 blk_co_do_preadv(BlockBackend *blk, int64_t offset, int64_t bytes,
                 QEMUIOVector *qiov, BdrvRequestFlags flags);
 int coroutine_fn
 blk_co_do_pwritev_part(BlockBackend *blk, int64_t offset, int64_t bytes,
                       QEMUIOVector *qiov, size_t qiov_offset,
                       BdrvRequestFlags flags);
 int coroutine_fn
 blk_co_do_ioctl(BlockBackend *blk, unsigned long int req, void *buf);
 int coroutine_fn
 blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
 int coroutine_fn blk_co_do_flush(BlockBackend *blk);
 /*
 * "I/O or GS" API functions. These functions can run without
 * the BQL, but only in one specific iothread/main loop.
@ -90,14 +71,6 @@ int coroutine_fn blk_co_do_flush(BlockBackend *blk);
 * the "I/O or GS" API.
 */
 int generated_co_wrapper
 bdrv_preadv(BdrvChild *child, int64_t offset, unsigned int bytes,
            QEMUIOVector *qiov, BdrvRequestFlags flags);
 int generated_co_wrapper
 bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
             QEMUIOVector *qiov, BdrvRequestFlags flags);
 int generated_co_wrapper
 bdrv_common_block_status_above(BlockDriverState *bs,
                               BlockDriverState *base,
@ -112,21 +85,4 @@ bdrv_common_block_status_above(BlockDriverState *bs,
 int generated_co_wrapper
 nbd_do_establish_connection(BlockDriverState *bs, bool blocking, Error **errp);
 int generated_co_wrapper
 blk_do_preadv(BlockBackend *blk, int64_t offset, int64_t bytes,
              QEMUIOVector *qiov, BdrvRequestFlags flags);
 int generated_co_wrapper
 blk_do_pwritev_part(BlockBackend *blk, int64_t offset, int64_t bytes,
                    QEMUIOVector *qiov, size_t qiov_offset,
                    BdrvRequestFlags flags);
 int generated_co_wrapper
 blk_do_ioctl(BlockBackend *blk, unsigned long int req, void *buf);
 int generated_co_wrapper
 blk_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
 int generated_co_wrapper blk_do_flush(BlockBackend *blk);
 #endif /* BLOCK_COROUTINES_H */
--- a/block/crypto.c
+++ b/block/crypto.c
@ -55,40 +55,40 @@ static int block_crypto_probe_generic(QCryptoBlockFormat format,
 }
-static ssize_t block_crypto_read_func(QCryptoBlock *block,
+static int block_crypto_read_func(QCryptoBlock *block,
-                                      size_t offset,
+                                  size_t offset,
-                                      uint8_t *buf,
+                                  uint8_t *buf,
-                                      size_t buflen,
+                                  size_t buflen,
-                                      void *opaque,
+                                  void *opaque,
-                                      Error **errp)
+                                  Error **errp)
 {
    BlockDriverState *bs = opaque;
    ssize_t ret;
-    ret = bdrv_pread(bs->file, offset, buf, buflen);
+    ret = bdrv_pread(bs->file, offset, buflen, buf, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not read encryption header");
        return ret;
    }
-    return ret;
+    return 0;
 }
-static ssize_t block_crypto_write_func(QCryptoBlock *block,
+static int block_crypto_write_func(QCryptoBlock *block,
-                                       size_t offset,
+                                   size_t offset,
-                                       const uint8_t *buf,
+                                   const uint8_t *buf,
-                                       size_t buflen,
+                                   size_t buflen,
-                                       void *opaque,
+                                   void *opaque,
-                                       Error **errp)
+                                   Error **errp)
 {
    BlockDriverState *bs = opaque;
    ssize_t ret;
-    ret = bdrv_pwrite(bs->file, offset, buf, buflen);
+    ret = bdrv_pwrite(bs->file, offset, buflen, buf, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not write encryption header");
        return ret;
    }
-    return ret;
+    return 0;
 }
@ -99,28 +99,28 @@ struct BlockCryptoCreateData {
 };
-static ssize_t block_crypto_create_write_func(QCryptoBlock *block,
+static int block_crypto_create_write_func(QCryptoBlock *block,
-                                              size_t offset,
+                                          size_t offset,
-                                              const uint8_t *buf,
+                                          const uint8_t *buf,
-                                              size_t buflen,
+                                          size_t buflen,
-                                              void *opaque,
+                                          void *opaque,
-                                              Error **errp)
+                                          Error **errp)
 {
    struct BlockCryptoCreateData *data = opaque;
    ssize_t ret;
-    ret = blk_pwrite(data->blk, offset, buf, buflen, 0);
+    ret = blk_pwrite(data->blk, offset, buflen, buf, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not write encryption header");
        return ret;
    }
-    return ret;
+    return 0;
 }
-static ssize_t block_crypto_create_init_func(QCryptoBlock *block,
+static int block_crypto_create_init_func(QCryptoBlock *block,
-                                             size_t headerlen,
+                                         size_t headerlen,
-                                             void *opaque,
+                                         void *opaque,
-                                             Error **errp)
+                                         Error **errp)
 {
    struct BlockCryptoCreateData *data = opaque;
    Error *local_error = NULL;
@ -139,7 +139,7 @@ static ssize_t block_crypto_create_init_func(QCryptoBlock *block,
                       data->prealloc, 0, &local_error);
    if (ret >= 0) {
-        return ret;
+        return 0;
    }
 error:
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@ -309,10 +309,7 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *parent,
        return NULL;
    }
-    if (!hbitmap_merge(parent->bitmap, successor->bitmap, parent->bitmap)) {
+    hbitmap_merge(parent->bitmap, successor->bitmap, parent->bitmap);
        error_setg(errp, "Merging of parent and successor bitmap failed");
        return NULL;
    }
    parent->disabled = successor->disabled;
    parent->busy = false;
@ -912,13 +909,15 @@ bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
        goto out;
    }
-    if (!hbitmap_can_merge(dest->bitmap, src->bitmap)) {
+    if (bdrv_dirty_bitmap_size(src) != bdrv_dirty_bitmap_size(dest)) {
-        error_setg(errp, "Bitmaps are incompatible and can't be merged");
+        error_setg(errp, "Bitmaps are of different sizes (destination size is %"
                   PRId64 ", source size is %" PRId64 ") and can't be merged",
                   bdrv_dirty_bitmap_size(dest), bdrv_dirty_bitmap_size(src));
        goto out;
    }
-    ret = bdrv_dirty_bitmap_merge_internal(dest, src, backup, false);
+    bdrv_dirty_bitmap_merge_internal(dest, src, backup, false);
-    assert(ret);
+    ret = true;
 out:
    bdrv_dirty_bitmaps_unlock(dest->bs);
@ -932,17 +931,16 @@ out:
 /**
 * bdrv_dirty_bitmap_merge_internal: merge src into dest.
 * Does NOT check bitmap permissions; not suitable for use as public API.
 * @dest, @src and @backup (if not NULL) must have same size.
 *
 * @backup: If provided, make a copy of dest here prior to merge.
 * @lock: If true, lock and unlock bitmaps on the way in/out.
 * returns true if the merge succeeded; false if unattempted.
 */
-bool bdrv_dirty_bitmap_merge_internal(BdrvDirtyBitmap *dest,
+void bdrv_dirty_bitmap_merge_internal(BdrvDirtyBitmap *dest,
                                      const BdrvDirtyBitmap *src,
                                      HBitmap **backup,
                                      bool lock)
 {
    bool ret;
    IO_CODE();
    assert(!bdrv_dirty_bitmap_readonly(dest));
@ -959,9 +957,9 @@ bool bdrv_dirty_bitmap_merge_internal(BdrvDirtyBitmap *dest,
    if (backup) {
        *backup = dest->bitmap;
        dest->bitmap = hbitmap_alloc(dest->size, hbitmap_granularity(*backup));
-        ret = hbitmap_merge(*backup, src->bitmap, dest->bitmap);
+        hbitmap_merge(*backup, src->bitmap, dest->bitmap);
    } else {
-        ret = hbitmap_merge(dest->bitmap, src->bitmap, dest->bitmap);
+        hbitmap_merge(dest->bitmap, src->bitmap, dest->bitmap);
    }
    if (lock) {
@ -970,6 +968,4 @@ bool bdrv_dirty_bitmap_merge_internal(BdrvDirtyBitmap *dest,
            bdrv_dirty_bitmaps_unlock(src->bs);
        }
    }
    return ret;
 }
--- a/block/dmg.c
+++ b/block/dmg.c
@ -77,7 +77,7 @@ static int read_uint64(BlockDriverState *bs, int64_t offset, uint64_t *result)
    uint64_t buffer;
    int ret;
-    ret = bdrv_pread(bs->file, offset, &buffer, 8);
+    ret = bdrv_pread(bs->file, offset, 8, &buffer, 0);
    if (ret < 0) {
        return ret;
    }
@ -91,7 +91,7 @@ static int read_uint32(BlockDriverState *bs, int64_t offset, uint32_t *result)
    uint32_t buffer;
    int ret;
-    ret = bdrv_pread(bs->file, offset, &buffer, 4);
+    ret = bdrv_pread(bs->file, offset, 4, &buffer, 0);
    if (ret < 0) {
        return ret;
    }
@ -172,7 +172,7 @@ static int64_t dmg_find_koly_offset(BdrvChild *file, Error **errp)
        offset = length - 511 - 512;
    }
    length = length < 515 ? length : 515;
-    ret = bdrv_pread(file, offset, buffer, length);
+    ret = bdrv_pread(file, offset, length, buffer, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Failed while reading UDIF trailer");
        return ret;
@ -352,7 +352,7 @@ static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds,
        offset += 4;
        buffer = g_realloc(buffer, count);
-        ret = bdrv_pread(bs->file, offset, buffer, count);
+        ret = bdrv_pread(bs->file, offset, count, buffer, 0);
        if (ret < 0) {
            goto fail;
        }
@ -389,8 +389,8 @@ static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds,
    buffer = g_malloc(info_length + 1);
    buffer[info_length] = '\0';
-    ret = bdrv_pread(bs->file, info_begin, buffer, info_length);
+    ret = bdrv_pread(bs->file, info_begin, info_length, buffer, 0);
-    if (ret != info_length) {
+    if (ret < 0) {
        ret = -EINVAL;
        goto fail;
    }
@ -609,9 +609,9 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
        case UDZO: { /* zlib compressed */
            /* we need to buffer, because only the chunk as whole can be
             * inflated. */
-            ret = bdrv_pread(bs->file, s->offsets[chunk],
+            ret = bdrv_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
-                             s->compressed_chunk, s->lengths[chunk]);
+                             s->compressed_chunk, 0);
-            if (ret != s->lengths[chunk]) {
+            if (ret < 0) {
                return -1;
            }
@ -635,9 +635,9 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
            }
            /* we need to buffer, because only the chunk as whole can be
             * inflated. */
-            ret = bdrv_pread(bs->file, s->offsets[chunk],
+            ret = bdrv_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
-                             s->compressed_chunk, s->lengths[chunk]);
+                             s->compressed_chunk, 0);
-            if (ret != s->lengths[chunk]) {
+            if (ret < 0) {
                return -1;
            }
@ -656,9 +656,9 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
            }
            /* we need to buffer, because only the chunk as whole can be
             * inflated. */
-            ret = bdrv_pread(bs->file, s->offsets[chunk],
+            ret = bdrv_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
-                             s->compressed_chunk, s->lengths[chunk]);
+                             s->compressed_chunk, 0);
-            if (ret != s->lengths[chunk]) {
+            if (ret < 0) {
                return -1;
            }
@ -672,9 +672,9 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
            }
            break;
        case UDRW: /* copy */
-            ret = bdrv_pread(bs->file, s->offsets[chunk],
+            ret = bdrv_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
-                             s->uncompressed_chunk, s->lengths[chunk]);
+                             s->uncompressed_chunk, 0);
-            if (ret != s->lengths[chunk]) {
+            if (ret < 0) {
                return -1;
            }
            break;
--- a/block/export/export.c
+++ b/block/export/export.c
@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
 #ifdef CONFIG_VDUSE_BLK_EXPORT
 #include "vduse-blk.h"
 #endif
 static const BlockExportDriver *blk_exp_drivers[] = {
    &blk_exp_nbd,
@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
    &blk_exp_fuse,
 #endif
 #ifdef CONFIG_VDUSE_BLK_EXPORT
    &blk_exp_vduse_blk,
 #endif
 };
 /* Only accessed from the main thread */
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@ -554,7 +554,7 @@ static void fuse_read(fuse_req_t req, fuse_ino_t inode,
        return;
    }
-    ret = blk_pread(exp->common.blk, offset, buf, size);
+    ret = blk_pread(exp->common.blk, offset, size, buf, 0);
    if (ret >= 0) {
        fuse_reply_buf(req, buf, size);
    } else {
@ -607,7 +607,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
        }
    }
-    ret = blk_pwrite(exp->common.blk, offset, buf, size, 0);
+    ret = blk_pwrite(exp->common.blk, offset, size, buf, 0);
    if (ret >= 0) {
        fuse_reply_write(req, size);
    } else {
--- a/block/export/meson.build
+++ b/block/export/meson.build
@ -1,7 +1,12 @@
 blockdev_ss.add(files('export.c'))
 if have_vhost_user_blk_server
-    blockdev_ss.add(files('vhost-user-blk-server.c'))
+    blockdev_ss.add(files('vhost-user-blk-server.c', 'virtio-blk-handler.c'))
 endif
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
 if have_vduse_blk_export
    blockdev_ss.add(files('vduse-blk.c', 'virtio-blk-handler.c'))
    blockdev_ss.add(libvduse)
 endif
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@ -0,0 +1,374 @@
 /*
 * Export QEMU block device via VDUSE
 *
 * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved.
 *
 * Author:
 *   Xie Yongji <xieyongji@bytedance.com>
 *
 * This work is licensed under the terms of the GNU GPL, version 2 or
 * later.  See the COPYING file in the top-level directory.
 */
 #include <sys/eventfd.h>
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "block/export.h"
 #include "qemu/error-report.h"
 #include "util/block-helpers.h"
 #include "subprojects/libvduse/libvduse.h"
 #include "virtio-blk-handler.h"
 #include "standard-headers/linux/virtio_blk.h"
 #define VDUSE_DEFAULT_NUM_QUEUE 1
 #define VDUSE_DEFAULT_QUEUE_SIZE 256
 typedef struct VduseBlkExport {
    BlockExport export;
    VirtioBlkHandler handler;
    VduseDev *dev;
    uint16_t num_queues;
    char *recon_file;
    unsigned int inflight;
 } VduseBlkExport;
 typedef struct VduseBlkReq {
    VduseVirtqElement elem;
    VduseVirtq *vq;
 } VduseBlkReq;
 static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
 {
    vblk_exp->inflight++;
 }
 static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
 {
    if (--vblk_exp->inflight == 0) {
        aio_wait_kick();
    }
 }
 static void vduse_blk_req_complete(VduseBlkReq *req, size_t in_len)
 {
    vduse_queue_push(req->vq, &req->elem, in_len);
    vduse_queue_notify(req->vq);
    free(req);
 }
 static void coroutine_fn vduse_blk_virtio_process_req(void *opaque)
 {
    VduseBlkReq *req = opaque;
    VduseVirtq *vq = req->vq;
    VduseDev *dev = vduse_queue_get_dev(vq);
    VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
    VirtioBlkHandler *handler = &vblk_exp->handler;
    VduseVirtqElement *elem = &req->elem;
    struct iovec *in_iov = elem->in_sg;
    struct iovec *out_iov = elem->out_sg;
    unsigned in_num = elem->in_num;
    unsigned out_num = elem->out_num;
    int in_len;
    in_len = virtio_blk_process_req(handler, in_iov,
                                    out_iov, in_num, out_num);
    if (in_len < 0) {
        free(req);
        return;
    }
    vduse_blk_req_complete(req, in_len);
    vduse_blk_inflight_dec(vblk_exp);
 }
 static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq *vq)
 {
    VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
    while (1) {
        VduseBlkReq *req;
        req = vduse_queue_pop(vq, sizeof(VduseBlkReq));
        if (!req) {
            break;
        }
        req->vq = vq;
        Coroutine *co =
            qemu_coroutine_create(vduse_blk_virtio_process_req, req);
        vduse_blk_inflight_inc(vblk_exp);
        qemu_coroutine_enter(co);
    }
 }
 static void on_vduse_vq_kick(void *opaque)
 {
    VduseVirtq *vq = opaque;
    VduseDev *dev = vduse_queue_get_dev(vq);
    int fd = vduse_queue_get_fd(vq);
    eventfd_t kick_data;
    if (eventfd_read(fd, &kick_data) == -1) {
        error_report("failed to read data from eventfd");
        return;
    }
    vduse_blk_vq_handler(dev, vq);
 }
 static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
 {
    VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
    aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
                       true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
    /* Make sure we don't miss any kick afer reconnecting */
    eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
 {
    VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
    aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
                       true, NULL, NULL, NULL, NULL, NULL);
 }
 static const VduseOps vduse_blk_ops = {
    .enable_queue = vduse_blk_enable_queue,
    .disable_queue = vduse_blk_disable_queue,
 };
 static void on_vduse_dev_kick(void *opaque)
 {
    VduseDev *dev = opaque;
    vduse_dev_handler(dev);
 }
 static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
 {
    int i;
    aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
                       true, on_vduse_dev_kick, NULL, NULL, NULL,
                       vblk_exp->dev);
    for (i = 0; i < vblk_exp->num_queues; i++) {
        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
        int fd = vduse_queue_get_fd(vq);
        if (fd < 0) {
            continue;
        }
        aio_set_fd_handler(vblk_exp->export.ctx, fd, true,
                           on_vduse_vq_kick, NULL, NULL, NULL, vq);
    }
 }
 static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
 {
    int i;
    for (i = 0; i < vblk_exp->num_queues; i++) {
        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
        int fd = vduse_queue_get_fd(vq);
        if (fd < 0) {
            continue;
        }
        aio_set_fd_handler(vblk_exp->export.ctx, fd,
                           true, NULL, NULL, NULL, NULL, NULL);
    }
    aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
                       true, NULL, NULL, NULL, NULL, NULL);
    AIO_WAIT_WHILE(vblk_exp->export.ctx, vblk_exp->inflight > 0);
 }
 static void blk_aio_attached(AioContext *ctx, void *opaque)
 {
    VduseBlkExport *vblk_exp = opaque;
    vblk_exp->export.ctx = ctx;
    vduse_blk_attach_ctx(vblk_exp, ctx);
 }
 static void blk_aio_detach(void *opaque)
 {
    VduseBlkExport *vblk_exp = opaque;
    vduse_blk_detach_ctx(vblk_exp);
    vblk_exp->export.ctx = NULL;
 }
 static void vduse_blk_resize(void *opaque)
 {
    BlockExport *exp = opaque;
    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
    struct virtio_blk_config config;
    config.capacity =
            cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
    vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
                            offsetof(struct virtio_blk_config, capacity),
                            (char *)&config.capacity);
 }
 static const BlockDevOps vduse_block_ops = {
    .resize_cb = vduse_blk_resize,
 };
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
                                Error **errp)
 {
    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
    BlockExportOptionsVduseBlk *vblk_opts = &opts->u.vduse_blk;
    uint64_t logical_block_size = VIRTIO_BLK_SECTOR_SIZE;
    uint16_t num_queues = VDUSE_DEFAULT_NUM_QUEUE;
    uint16_t queue_size = VDUSE_DEFAULT_QUEUE_SIZE;
    Error *local_err = NULL;
    struct virtio_blk_config config = { 0 };
    uint64_t features;
    int i, ret;
    if (vblk_opts->has_num_queues) {
        num_queues = vblk_opts->num_queues;
        if (num_queues == 0) {
            error_setg(errp, "num-queues must be greater than 0");
            return -EINVAL;
        }
    }
    if (vblk_opts->has_queue_size) {
        queue_size = vblk_opts->queue_size;
        if (queue_size <= 2 || !is_power_of_2(queue_size) ||
            queue_size > VIRTQUEUE_MAX_SIZE) {
            error_setg(errp, "queue-size is invalid");
            return -EINVAL;
        }
    }
    if (vblk_opts->has_logical_block_size) {
        logical_block_size = vblk_opts->logical_block_size;
        check_block_size(exp->id, "logical-block-size", logical_block_size,
                         &local_err);
        if (local_err) {
            error_propagate(errp, local_err);
            return -EINVAL;
        }
    }
    vblk_exp->num_queues = num_queues;
    vblk_exp->handler.blk = exp->blk;
    vblk_exp->handler.serial = g_strdup(vblk_opts->has_serial ?
                                        vblk_opts->serial : "");
    vblk_exp->handler.logical_block_size = logical_block_size;
    vblk_exp->handler.writable = opts->writable;
    config.capacity =
            cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
    config.seg_max = cpu_to_le32(queue_size - 2);
    config.min_io_size = cpu_to_le16(1);
    config.opt_io_size = cpu_to_le32(1);
    config.num_queues = cpu_to_le16(num_queues);
    config.blk_size = cpu_to_le32(logical_block_size);
    config.max_discard_sectors = cpu_to_le32(VIRTIO_BLK_MAX_DISCARD_SECTORS);
    config.max_discard_seg = cpu_to_le32(1);
    config.discard_sector_alignment =
        cpu_to_le32(logical_block_size >> VIRTIO_BLK_SECTOR_BITS);
    config.max_write_zeroes_sectors =
        cpu_to_le32(VIRTIO_BLK_MAX_WRITE_ZEROES_SECTORS);
    config.max_write_zeroes_seg = cpu_to_le32(1);
    features = vduse_get_virtio_features() |
               (1ULL << VIRTIO_BLK_F_SEG_MAX) |
               (1ULL << VIRTIO_BLK_F_TOPOLOGY) |
               (1ULL << VIRTIO_BLK_F_BLK_SIZE) |
               (1ULL << VIRTIO_BLK_F_FLUSH) |
               (1ULL << VIRTIO_BLK_F_DISCARD) |
               (1ULL << VIRTIO_BLK_F_WRITE_ZEROES);
    if (num_queues > 1) {
        features |= 1ULL << VIRTIO_BLK_F_MQ;
    }
    if (!opts->writable) {
        features |= 1ULL << VIRTIO_BLK_F_RO;
    }
    vblk_exp->dev = vduse_dev_create(vblk_opts->name, VIRTIO_ID_BLOCK, 0,
                                     features, num_queues,
                                     sizeof(struct virtio_blk_config),
                                     (char *)&config, &vduse_blk_ops,
                                     vblk_exp);
    if (!vblk_exp->dev) {
        error_setg(errp, "failed to create vduse device");
        ret = -ENOMEM;
        goto err_dev;
    }
    vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
                                           g_get_tmp_dir(), vblk_opts->name);
    if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
        error_setg(errp, "failed to set reconnect log file");
        ret = -EINVAL;
        goto err;
    }
    for (i = 0; i < num_queues; i++) {
        vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
    }
    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), true,
                       on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev);
    blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                 vblk_exp);
    blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
    return 0;
 err:
    vduse_dev_destroy(vblk_exp->dev);
    g_free(vblk_exp->recon_file);
 err_dev:
    g_free(vblk_exp->handler.serial);
    return ret;
 }
 static void vduse_blk_exp_delete(BlockExport *exp)
 {
    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
    int ret;
    blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                    vblk_exp);
    blk_set_dev_ops(exp->blk, NULL, NULL);
    ret = vduse_dev_destroy(vblk_exp->dev);
    if (ret != -EBUSY) {
        unlink(vblk_exp->recon_file);
    }
    g_free(vblk_exp->recon_file);
    g_free(vblk_exp->handler.serial);
 }
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
 {
    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
    aio_context_acquire(vblk_exp->export.ctx);
    vduse_blk_detach_ctx(vblk_exp);
    aio_context_acquire(vblk_exp->export.ctx);
 }
 const BlockExportDriver blk_exp_vduse_blk = {
    .type               = BLOCK_EXPORT_TYPE_VDUSE_BLK,
    .instance_size      = sizeof(VduseBlkExport),
    .create             = vduse_blk_exp_create,
    .delete             = vduse_blk_exp_delete,
    .request_shutdown   = vduse_blk_exp_request_shutdown,
 };
--- a/block/export/vduse-blk.h
+++ b/block/export/vduse-blk.h
@ -0,0 +1,20 @@
 /*
 * Export QEMU block device via VDUSE
 *
 * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved.
 *
 * Author:
 *   Xie Yongji <xieyongji@bytedance.com>
 *
 * This work is licensed under the terms of the GNU GPL, version 2 or
 * later.  See the COPYING file in the top-level directory.
 */
 #ifndef VDUSE_BLK_H
 #define VDUSE_BLK_H
 #include "block/export.h"
 extern const BlockExportDriver blk_exp_vduse_blk;
 #endif /* VDUSE_BLK_H */
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@ -17,31 +17,15 @@
 #include "vhost-user-blk-server.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
 #include "sysemu/block-backend.h"
 #include "util/block-helpers.h"
-
+#include "virtio-blk-handler.h"
 /*
 * Sector units are 512 bytes regardless of the
 * virtio_blk_config->blk_size value.
 */
 #define VIRTIO_BLK_SECTOR_BITS 9
 #define VIRTIO_BLK_SECTOR_SIZE (1ull << VIRTIO_BLK_SECTOR_BITS)
 enum {
    VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
    VHOST_USER_BLK_MAX_DISCARD_SECTORS = 32768,
    VHOST_USER_BLK_MAX_WRITE_ZEROES_SECTORS = 32768,
 };
 struct virtio_blk_inhdr {
    unsigned char status;
 };
 typedef struct VuBlkReq {
    VuVirtqElement elem;
    int64_t sector_num;
    size_t size;
    struct virtio_blk_inhdr *in;
    struct virtio_blk_outhdr out;
    VuServer *server;
    struct VuVirtq *vq;
 } VuBlkReq;
@ -50,248 +34,44 @@ typedef struct VuBlkReq {
 typedef struct {
    BlockExport export;
    VuServer vu_server;
-    uint32_t blk_size;
+    VirtioBlkHandler handler;
    QIOChannelSocket *sioc;
    struct virtio_blk_config blkcfg;
    bool writable;
 } VuBlkExport;
-static void vu_blk_req_complete(VuBlkReq *req)
+static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
 {
    VuDev *vu_dev = &req->server->vu_dev;
-    /* IO size with 1 extra status byte */
+    vu_queue_push(vu_dev, req->vq, &req->elem, in_len);
    vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
    vu_queue_notify(vu_dev, req->vq);
    free(req);
 }
 static bool vu_blk_sect_range_ok(VuBlkExport *vexp, uint64_t sector,
                                 size_t size)
 {
    uint64_t nb_sectors;
    uint64_t total_sectors;
    if (size % VIRTIO_BLK_SECTOR_SIZE) {
        return false;
    }
    nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
    QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
    if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
        return false;
    }
    if ((sector << VIRTIO_BLK_SECTOR_BITS) % vexp->blk_size) {
        return false;
    }
    blk_get_geometry(vexp->export.blk, &total_sectors);
    if (sector > total_sectors || nb_sectors > total_sectors - sector) {
        return false;
    }
    return true;
 }
 static int coroutine_fn
 vu_blk_discard_write_zeroes(VuBlkExport *vexp, struct iovec *iov,
                            uint32_t iovcnt, uint32_t type)
 {
    BlockBackend *blk = vexp->export.blk;
    struct virtio_blk_discard_write_zeroes desc;
    ssize_t size;
    uint64_t sector;
    uint32_t num_sectors;
    uint32_t max_sectors;
    uint32_t flags;
    int bytes;
    /* Only one desc is currently supported */
    if (unlikely(iov_size(iov, iovcnt) > sizeof(desc))) {
        return VIRTIO_BLK_S_UNSUPP;
    }
    size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
    if (unlikely(size != sizeof(desc))) {
        error_report("Invalid size %zd, expected %zu", size, sizeof(desc));
        return VIRTIO_BLK_S_IOERR;
    }
    sector = le64_to_cpu(desc.sector);
    num_sectors = le32_to_cpu(desc.num_sectors);
    flags = le32_to_cpu(desc.flags);
    max_sectors = (type == VIRTIO_BLK_T_WRITE_ZEROES) ?
                  VHOST_USER_BLK_MAX_WRITE_ZEROES_SECTORS :
                  VHOST_USER_BLK_MAX_DISCARD_SECTORS;
    /* This check ensures that 'bytes' fits in an int */
    if (unlikely(num_sectors > max_sectors)) {
        return VIRTIO_BLK_S_IOERR;
    }
    bytes = num_sectors << VIRTIO_BLK_SECTOR_BITS;
    if (unlikely(!vu_blk_sect_range_ok(vexp, sector, bytes))) {
        return VIRTIO_BLK_S_IOERR;
    }
    /*
     * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard
     * and write zeroes commands if any unknown flag is set.
     */
    if (unlikely(flags & ~VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
        return VIRTIO_BLK_S_UNSUPP;
    }
    if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
        int blk_flags = 0;
        if (flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP) {
            blk_flags |= BDRV_REQ_MAY_UNMAP;
        }
        if (blk_co_pwrite_zeroes(blk, sector << VIRTIO_BLK_SECTOR_BITS,
                                 bytes, blk_flags) == 0) {
            return VIRTIO_BLK_S_OK;
        }
    } else if (type == VIRTIO_BLK_T_DISCARD) {
        /*
         * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for
         * discard commands if the unmap flag is set.
         */
        if (unlikely(flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
            return VIRTIO_BLK_S_UNSUPP;
        }
        if (blk_co_pdiscard(blk, sector << VIRTIO_BLK_SECTOR_BITS,
                            bytes) == 0) {
            return VIRTIO_BLK_S_OK;
        }
    }
    return VIRTIO_BLK_S_IOERR;
 }
 /* Called with server refcount increased, must decrease before returning */
 static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
 {
    VuBlkReq *req = opaque;
    VuServer *server = req->server;
    VuVirtqElement *elem = &req->elem;
    uint32_t type;
    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
-    BlockBackend *blk = vexp->export.blk;
+    VirtioBlkHandler *handler = &vexp->handler;
    struct iovec *in_iov = elem->in_sg;
    struct iovec *out_iov = elem->out_sg;
    unsigned in_num = elem->in_num;
    unsigned out_num = elem->out_num;
    int in_len;
-    /* refer to hw/block/virtio_blk.c */
+    in_len = virtio_blk_process_req(handler, in_iov, out_iov,
-    if (elem->out_num < 1 || elem->in_num < 1) {
+                                    in_num, out_num);
-        error_report("virtio-blk request missing headers");
+    if (in_len < 0) {
-        goto err;
+        free(req);
        vhost_user_server_unref(server);
        return;
    }
-    if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
+    vu_blk_req_complete(req, in_len);
                            sizeof(req->out)) != sizeof(req->out))) {
        error_report("virtio-blk request outhdr too short");
        goto err;
    }
    iov_discard_front(&out_iov, &out_num, sizeof(req->out));
    if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
        error_report("virtio-blk request inhdr too short");
        goto err;
    }
    /* We always touch the last byte, so just see how big in_iov is.  */
    req->in = (void *)in_iov[in_num - 1].iov_base
              + in_iov[in_num - 1].iov_len
              - sizeof(struct virtio_blk_inhdr);
    iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
    type = le32_to_cpu(req->out.type);
    switch (type & ~VIRTIO_BLK_T_BARRIER) {
    case VIRTIO_BLK_T_IN:
    case VIRTIO_BLK_T_OUT: {
        QEMUIOVector qiov;
        int64_t offset;
        ssize_t ret = 0;
        bool is_write = type & VIRTIO_BLK_T_OUT;
        req->sector_num = le64_to_cpu(req->out.sector);
        if (is_write && !vexp->writable) {
            req->in->status = VIRTIO_BLK_S_IOERR;
            break;
        }
        if (is_write) {
            qemu_iovec_init_external(&qiov, out_iov, out_num);
        } else {
            qemu_iovec_init_external(&qiov, in_iov, in_num);
        }
        if (unlikely(!vu_blk_sect_range_ok(vexp,
                                           req->sector_num,
                                           qiov.size))) {
            req->in->status = VIRTIO_BLK_S_IOERR;
            break;
        }
        offset = req->sector_num << VIRTIO_BLK_SECTOR_BITS;
        if (is_write) {
            ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
        } else {
            ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
        }
        if (ret >= 0) {
            req->in->status = VIRTIO_BLK_S_OK;
        } else {
            req->in->status = VIRTIO_BLK_S_IOERR;
        }
        break;
    }
    case VIRTIO_BLK_T_FLUSH:
        if (blk_co_flush(blk) == 0) {
            req->in->status = VIRTIO_BLK_S_OK;
        } else {
            req->in->status = VIRTIO_BLK_S_IOERR;
        }
        break;
    case VIRTIO_BLK_T_GET_ID: {
        size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
                          VIRTIO_BLK_ID_BYTES);
        snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
        req->in->status = VIRTIO_BLK_S_OK;
        req->size = elem->in_sg[0].iov_len;
        break;
    }
    case VIRTIO_BLK_T_DISCARD:
    case VIRTIO_BLK_T_WRITE_ZEROES: {
        if (!vexp->writable) {
            req->in->status = VIRTIO_BLK_S_IOERR;
            break;
        }
        req->in->status = vu_blk_discard_write_zeroes(vexp, out_iov, out_num,
                                                      type);
        break;
    }
    default:
        req->in->status = VIRTIO_BLK_S_UNSUPP;
        break;
    }
    vu_blk_req_complete(req);
    vhost_user_server_unref(server);
    return;
 err:
    free(req);
    vhost_user_server_unref(server);
 }
@ -348,7 +128,7 @@ static uint64_t vu_blk_get_features(VuDev *dev)
               1ull << VIRTIO_RING_F_EVENT_IDX |
               1ull << VHOST_USER_F_PROTOCOL_FEATURES;
-    if (!vexp->writable) {
+    if (!vexp->handler.writable) {
        features |= 1ull << VIRTIO_BLK_F_RO;
    }
@ -455,12 +235,12 @@ vu_blk_initialize_config(BlockDriverState *bs,
    config->opt_io_size = cpu_to_le32(1);
    config->num_queues = cpu_to_le16(num_queues);
    config->max_discard_sectors =
-        cpu_to_le32(VHOST_USER_BLK_MAX_DISCARD_SECTORS);
+        cpu_to_le32(VIRTIO_BLK_MAX_DISCARD_SECTORS);
    config->max_discard_seg = cpu_to_le32(1);
    config->discard_sector_alignment =
        cpu_to_le32(blk_size >> VIRTIO_BLK_SECTOR_BITS);
    config->max_write_zeroes_sectors
-        = cpu_to_le32(VHOST_USER_BLK_MAX_WRITE_ZEROES_SECTORS);
+        = cpu_to_le32(VIRTIO_BLK_MAX_WRITE_ZEROES_SECTORS);
    config->max_write_zeroes_seg = cpu_to_le32(1);
 }
@ -480,7 +260,6 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
    uint64_t logical_block_size;
    uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
    vexp->writable = opts->writable;
    vexp->blkcfg.wce = 0;
    if (vu_opts->has_logical_block_size) {
@ -494,8 +273,6 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
        error_propagate(errp, local_err);
        return -EINVAL;
    }
    vexp->blk_size = logical_block_size;
    blk_set_guest_block_size(exp->blk, logical_block_size);
    if (vu_opts->has_num_queues) {
        num_queues = vu_opts->num_queues;
@ -504,6 +281,10 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
        error_setg(errp, "num-queues must be greater than 0");
        return -EINVAL;
    }
    vexp->handler.blk = exp->blk;
    vexp->handler.serial = g_strdup("vhost_user_blk");
    vexp->handler.logical_block_size = logical_block_size;
    vexp->handler.writable = opts->writable;
    vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
                             logical_block_size, num_queues);
@ -515,6 +296,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
                                 num_queues, &vu_blk_iface, errp)) {
        blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
                                        blk_aio_detach, vexp);
        g_free(vexp->handler.serial);
        return -EADDRNOTAVAIL;
    }
@ -527,6 +309,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
    blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                    vexp);
    g_free(vexp->handler.serial);
 }
 const BlockExportDriver blk_exp_vhost_user_blk = {
--- a/block/export/virtio-blk-handler.c
+++ b/block/export/virtio-blk-handler.c
@ -0,0 +1,240 @@
 /*
 * Handler for virtio-blk I/O
 *
 * Copyright (c) 2020 Red Hat, Inc.
 * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved.
 *
 * Author:
 *   Coiby Xu <coiby.xu@gmail.com>
 *   Xie Yongji <xieyongji@bytedance.com>
 *
 * This work is licensed under the terms of the GNU GPL, version 2 or
 * later.  See the COPYING file in the top-level directory.
 */
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "virtio-blk-handler.h"
 #include "standard-headers/linux/virtio_blk.h"
 struct virtio_blk_inhdr {
    unsigned char status;
 };
 static bool virtio_blk_sect_range_ok(BlockBackend *blk, uint32_t block_size,
                                     uint64_t sector, size_t size)
 {
    uint64_t nb_sectors;
    uint64_t total_sectors;
    if (size % VIRTIO_BLK_SECTOR_SIZE) {
        return false;
    }
    nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
    QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
    if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
        return false;
    }
    if ((sector << VIRTIO_BLK_SECTOR_BITS) % block_size) {
        return false;
    }
    blk_get_geometry(blk, &total_sectors);
    if (sector > total_sectors || nb_sectors > total_sectors - sector) {
        return false;
    }
    return true;
 }
 static int coroutine_fn
 virtio_blk_discard_write_zeroes(VirtioBlkHandler *handler, struct iovec *iov,
                                uint32_t iovcnt, uint32_t type)
 {
    BlockBackend *blk = handler->blk;
    struct virtio_blk_discard_write_zeroes desc;
    ssize_t size;
    uint64_t sector;
    uint32_t num_sectors;
    uint32_t max_sectors;
    uint32_t flags;
    int bytes;
    /* Only one desc is currently supported */
    if (unlikely(iov_size(iov, iovcnt) > sizeof(desc))) {
        return VIRTIO_BLK_S_UNSUPP;
    }
    size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
    if (unlikely(size != sizeof(desc))) {
        error_report("Invalid size %zd, expected %zu", size, sizeof(desc));
        return VIRTIO_BLK_S_IOERR;
    }
    sector = le64_to_cpu(desc.sector);
    num_sectors = le32_to_cpu(desc.num_sectors);
    flags = le32_to_cpu(desc.flags);
    max_sectors = (type == VIRTIO_BLK_T_WRITE_ZEROES) ?
                  VIRTIO_BLK_MAX_WRITE_ZEROES_SECTORS :
                  VIRTIO_BLK_MAX_DISCARD_SECTORS;
    /* This check ensures that 'bytes' fits in an int */
    if (unlikely(num_sectors > max_sectors)) {
        return VIRTIO_BLK_S_IOERR;
    }
    bytes = num_sectors << VIRTIO_BLK_SECTOR_BITS;
    if (unlikely(!virtio_blk_sect_range_ok(blk, handler->logical_block_size,
                                           sector, bytes))) {
        return VIRTIO_BLK_S_IOERR;
    }
    /*
     * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard
     * and write zeroes commands if any unknown flag is set.
     */
    if (unlikely(flags & ~VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
        return VIRTIO_BLK_S_UNSUPP;
    }
    if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
        int blk_flags = 0;
        if (flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP) {
            blk_flags |= BDRV_REQ_MAY_UNMAP;
        }
        if (blk_co_pwrite_zeroes(blk, sector << VIRTIO_BLK_SECTOR_BITS,
                                 bytes, blk_flags) == 0) {
            return VIRTIO_BLK_S_OK;
        }
    } else if (type == VIRTIO_BLK_T_DISCARD) {
        /*
         * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for
         * discard commands if the unmap flag is set.
         */
        if (unlikely(flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
            return VIRTIO_BLK_S_UNSUPP;
        }
        if (blk_co_pdiscard(blk, sector << VIRTIO_BLK_SECTOR_BITS,
                            bytes) == 0) {
            return VIRTIO_BLK_S_OK;
        }
    }
    return VIRTIO_BLK_S_IOERR;
 }
 int coroutine_fn virtio_blk_process_req(VirtioBlkHandler *handler,
                                        struct iovec *in_iov,
                                        struct iovec *out_iov,
                                        unsigned int in_num,
                                        unsigned int out_num)
 {
    BlockBackend *blk = handler->blk;
    struct virtio_blk_inhdr *in;
    struct virtio_blk_outhdr out;
    uint32_t type;
    int in_len;
    if (out_num < 1 || in_num < 1) {
        error_report("virtio-blk request missing headers");
        return -EINVAL;
    }
    if (unlikely(iov_to_buf(out_iov, out_num, 0, &out,
                            sizeof(out)) != sizeof(out))) {
        error_report("virtio-blk request outhdr too short");
        return -EINVAL;
    }
    iov_discard_front(&out_iov, &out_num, sizeof(out));
    if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
        error_report("virtio-blk request inhdr too short");
        return -EINVAL;
    }
    /* We always touch the last byte, so just see how big in_iov is. */
    in_len = iov_size(in_iov, in_num);
    in = (void *)in_iov[in_num - 1].iov_base
                 + in_iov[in_num - 1].iov_len
                 - sizeof(struct virtio_blk_inhdr);
    iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
    type = le32_to_cpu(out.type);
    switch (type & ~VIRTIO_BLK_T_BARRIER) {
    case VIRTIO_BLK_T_IN:
    case VIRTIO_BLK_T_OUT: {
        QEMUIOVector qiov;
        int64_t offset;
        ssize_t ret = 0;
        bool is_write = type & VIRTIO_BLK_T_OUT;
        int64_t sector_num = le64_to_cpu(out.sector);
        if (is_write && !handler->writable) {
            in->status = VIRTIO_BLK_S_IOERR;
            break;
        }
        if (is_write) {
            qemu_iovec_init_external(&qiov, out_iov, out_num);
        } else {
            qemu_iovec_init_external(&qiov, in_iov, in_num);
        }
        if (unlikely(!virtio_blk_sect_range_ok(blk,
                                               handler->logical_block_size,
                                               sector_num, qiov.size))) {
            in->status = VIRTIO_BLK_S_IOERR;
            break;
        }
        offset = sector_num << VIRTIO_BLK_SECTOR_BITS;
        if (is_write) {
            ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
        } else {
            ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
        }
        if (ret >= 0) {
            in->status = VIRTIO_BLK_S_OK;
        } else {
            in->status = VIRTIO_BLK_S_IOERR;
        }
        break;
    }
    case VIRTIO_BLK_T_FLUSH:
        if (blk_co_flush(blk) == 0) {
            in->status = VIRTIO_BLK_S_OK;
        } else {
            in->status = VIRTIO_BLK_S_IOERR;
        }
        break;
    case VIRTIO_BLK_T_GET_ID: {
        size_t size = MIN(strlen(handler->serial) + 1,
                          MIN(iov_size(in_iov, in_num),
                              VIRTIO_BLK_ID_BYTES));
        iov_from_buf(in_iov, in_num, 0, handler->serial, size);
        in->status = VIRTIO_BLK_S_OK;
        break;
    }
    case VIRTIO_BLK_T_DISCARD:
    case VIRTIO_BLK_T_WRITE_ZEROES:
        if (!handler->writable) {
            in->status = VIRTIO_BLK_S_IOERR;
            break;
        }
        in->status = virtio_blk_discard_write_zeroes(handler, out_iov,
                                                     out_num, type);
        break;
    default:
        in->status = VIRTIO_BLK_S_UNSUPP;
        break;
    }
    return in_len;
 }
--- a/block/export/virtio-blk-handler.h
+++ b/block/export/virtio-blk-handler.h
@ -0,0 +1,37 @@
 /*
 * Handler for virtio-blk I/O
 *
 * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved.
 *
 * Author:
 *   Xie Yongji <xieyongji@bytedance.com>
 *
 * This work is licensed under the terms of the GNU GPL, version 2 or
 * later.  See the COPYING file in the top-level directory.
 */
 #ifndef VIRTIO_BLK_HANDLER_H
 #define VIRTIO_BLK_HANDLER_H
 #include "sysemu/block-backend.h"
 #define VIRTIO_BLK_SECTOR_BITS 9
 #define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS)
 #define VIRTIO_BLK_MAX_DISCARD_SECTORS 32768
 #define VIRTIO_BLK_MAX_WRITE_ZEROES_SECTORS 32768
 typedef struct {
    BlockBackend *blk;
    char *serial;
    uint32_t logical_block_size;
    bool writable;
 } VirtioBlkHandler;
 int coroutine_fn virtio_blk_process_req(VirtioBlkHandler *handler,
                                        struct iovec *in_iov,
                                        struct iovec *out_iov,
                                        unsigned int in_num,
                                        unsigned int out_num);
 #endif /* VIRTIO_BLK_HANDLER_H */
--- a/block/gluster.c
+++ b/block/gluster.c
@ -891,7 +891,7 @@ out:
 static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
 {
    bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
-    bs->bl.max_pdiscard = SIZE_MAX;
+    bs->bl.max_pdiscard = MIN(SIZE_MAX, INT64_MAX);
 }
 static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
--- a/block/io.c
+++ b/block/io.c
@ -588,21 +588,6 @@ void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
    BDRV_POLL_WHILE(child->bs, qatomic_read(&drained_end_counter) > 0);
 }
 /*
 * Wait for pending requests to complete on a single BlockDriverState subtree,
 * and suspend block driver's internal I/O until next request arrives.
 *
 * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
 * AioContext.
 */
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
    IO_OR_GS_CODE();
    assert(qemu_in_coroutine());
    bdrv_drained_begin(bs);
    bdrv_drained_end(bs);
 }
 void bdrv_drain(BlockDriverState *bs)
 {
    IO_OR_GS_CODE();
@ -1061,14 +1046,6 @@ static int bdrv_check_request32(int64_t offset, int64_t bytes,
    return 0;
 }
 int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
                       int64_t bytes, BdrvRequestFlags flags)
 {
    IO_CODE();
    return bdrv_pwritev(child, offset, bytes, NULL,
                        BDRV_REQ_ZERO_WRITE | flags);
 }
 /*
 * Completely zero out a block device with the help of bdrv_pwrite_zeroes.
 * The operation is sped up by checking the block status and only writing
@ -1111,62 +1088,25 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
    }
 }
 /* See bdrv_pwrite() for the return codes */
 int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int64_t bytes)
 {
    int ret;
    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
    IO_CODE();
    if (bytes < 0) {
        return -EINVAL;
    }
    ret = bdrv_preadv(child, offset, bytes, &qiov,  0);
    return ret < 0 ? ret : bytes;
 }
 /* Return no. of bytes on success or < 0 on error. Important errors are:
  -EIO         generic I/O error (may happen for all errors)
  -ENOMEDIUM   No media inserted.
  -EINVAL      Invalid offset or number of bytes
  -EACCES      Trying to write a read-only device
 */
 int bdrv_pwrite(BdrvChild *child, int64_t offset, const void *buf,
                int64_t bytes)
 {
    int ret;
    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
    IO_CODE();
    if (bytes < 0) {
        return -EINVAL;
    }
    ret = bdrv_pwritev(child, offset, bytes, &qiov, 0);
    return ret < 0 ? ret : bytes;
 }
 /*
 * Writes to the file and ensures that no writes are reordered across this
 * request (acts as a barrier)
 *
 * Returns 0 on success, -errno in error cases.
 */
-int bdrv_pwrite_sync(BdrvChild *child, int64_t offset,
+int coroutine_fn bdrv_co_pwrite_sync(BdrvChild *child, int64_t offset,
-                     const void *buf, int64_t count)
+                                     int64_t bytes, const void *buf,
                                     BdrvRequestFlags flags)
 {
    int ret;
    IO_CODE();
-    ret = bdrv_pwrite(child, offset, buf, count);
+    ret = bdrv_co_pwrite(child, offset, bytes, buf, flags);
    if (ret < 0) {
        return ret;
    }
-    ret = bdrv_flush(child->bs);
+    ret = bdrv_co_flush(child->bs);
    if (ret < 0) {
        return ret;
    }
--- a/block/io_uring.c
+++ b/block/io_uring.c
@ -73,12 +73,8 @@ static void luring_resubmit(LuringState *s, LuringAIOCB *luringcb)
 /**
 * luring_resubmit_short_read:
 *
- * Before Linux commit 9d93a3f5a0c ("io_uring: punt short reads to async
+ * Short reads are rare but may occur. The remaining read request needs to be
- * context") a buffered I/O request with the start of the file range in the
+ * resubmitted.
 * page cache could result in a short read.  Applications need to resubmit the
 * remaining read request.
 *
 * This is a slow path but recent kernels never take it.
 */
 static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                                       int nread)
@ -89,7 +85,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
    trace_luring_resubmit_short_read(s, luringcb, nread);
    /* Update read position */
-    luringcb->total_read = nread;
+    luringcb->total_read += nread;
    remaining = luringcb->qiov->size - luringcb->total_read;
    /* Shorten qiov */
@ -103,7 +99,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                      remaining);
    /* Update sqe */
-    luringcb->sqeq.off = nread;
+    luringcb->sqeq.off += nread;
    luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
    luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
--- a/block/meson.build
+++ b/block/meson.build
@ -136,6 +136,7 @@ block_gen_c = custom_target('block-gen.c',
                            input: files(
                                      '../include/block/block-io.h',
                                      '../include/block/block-global-state.h',
                                      '../include/sysemu/block-backend-io.h',
                                      'coroutines.h'
                                      ),
                            command: [wrapper_py, '@OUTPUT@', '@INPUT@'])
--- a/block/mirror.c
+++ b/block/mirror.c
@ -73,7 +73,7 @@ typedef struct MirrorBlockJob {
    uint64_t last_pause_ns;
    unsigned long *in_flight_bitmap;
-    int in_flight;
+    unsigned in_flight;
    int64_t bytes_in_flight;
    QTAILQ_HEAD(, MirrorOp) ops_in_flight;
    int ret;
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@ -261,8 +261,9 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
                                          HBitmap **backup, Error **errp)
 {
    BlockDriverState *bs;
-    BdrvDirtyBitmap *dst, *src, *anon;
+    BdrvDirtyBitmap *dst, *src;
    BlockDirtyBitmapOrStrList *lst;
    HBitmap *local_backup = NULL;
    GLOBAL_STATE_CODE();
@ -271,12 +272,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
        return NULL;
    }
    anon = bdrv_create_dirty_bitmap(bs, bdrv_dirty_bitmap_granularity(dst),
                                    NULL, errp);
    if (!anon) {
        return NULL;
    }
    for (lst = bms; lst; lst = lst->next) {
        switch (lst->value->type) {
            const char *name, *node;
@ -285,8 +280,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
            src = bdrv_find_dirty_bitmap(bs, name);
            if (!src) {
                error_setg(errp, "Dirty bitmap '%s' not found", name);
-                dst = NULL;
+                goto fail;
                goto out;
            }
            break;
        case QTYPE_QDICT:
@ -294,26 +288,36 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
            name = lst->value->u.external.name;
            src = block_dirty_bitmap_lookup(node, name, NULL, errp);
            if (!src) {
-                dst = NULL;
+                goto fail;
                goto out;
            }
            break;
        default:
            abort();
        }
-        if (!bdrv_merge_dirty_bitmap(anon, src, NULL, errp)) {
+        /* We do backup only for first merge operation */
-            dst = NULL;
+        if (!bdrv_merge_dirty_bitmap(dst, src,
-            goto out;
+                                     local_backup ? NULL : &local_backup,
                                     errp))
        {
            goto fail;
        }
    }
-    /* Merge into dst; dst is unchanged on failure. */
+    if (backup) {
-    bdrv_merge_dirty_bitmap(dst, anon, backup, errp);
+        *backup = local_backup;
    } else {
        hbitmap_free(local_backup);
    }
 out:
    bdrv_release_dirty_bitmap(anon);
    return dst;
 fail:
    if (local_backup) {
        bdrv_restore_dirty_bitmap(dst, local_backup);
    }
    return NULL;
 }
 void qmp_block_dirty_bitmap_merge(const char *node, const char *target,
--- a/block/nbd.c
+++ b/block/nbd.c
@ -77,7 +77,7 @@ typedef struct BDRVNBDState {
    QemuMutex requests_lock;
    NBDClientState state;
    CoQueue free_sema;
-    int in_flight;
+    unsigned in_flight;
    NBDClientRequest requests[MAX_NBD_REQUESTS];
    QEMUTimer *reconnect_delay_timer;
@ -371,6 +371,7 @@ static bool nbd_client_connecting(BDRVNBDState *s)
 /* Called with s->requests_lock taken.  */
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
    int ret;
    bool blocking = s->state == NBD_CLIENT_CONNECTING_WAIT;
    /*
@ -380,6 +381,8 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
    assert(nbd_client_connecting(s));
    assert(s->in_flight == 1);
    trace_nbd_reconnect_attempt(s->bs->in_flight);
    if (blocking && !s->reconnect_delay_timer) {
        /*
         * It's the first reconnect attempt after switching to
@ -401,7 +404,8 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
    }
    qemu_mutex_unlock(&s->requests_lock);
-    nbd_co_do_establish_connection(s->bs, blocking, NULL);
+    ret = nbd_co_do_establish_connection(s->bs, blocking, NULL);
    trace_nbd_reconnect_attempt_result(ret, s->bs->in_flight);
    qemu_mutex_lock(&s->requests_lock);
    /*
@ -521,12 +525,8 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
    if (qiov) {
        qio_channel_set_cork(s->ioc, true);
        rc = nbd_send_request(s->ioc, request);
-        if (rc >= 0) {
+        if (rc >= 0 && qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
-            if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
+                                              NULL) < 0) {
                                       NULL) < 0) {
                rc = -EIO;
            }
        } else if (rc >= 0) {
            rc = -EIO;
        }
        qio_channel_set_cork(s->ioc, false);
--- a/block/parallels-ext.c
+++ b/block/parallels-ext.c
@ -93,8 +93,8 @@ static int parallels_load_bitmap_data(BlockDriverState *bs,
        if (entry == 1) {
            bdrv_dirty_bitmap_deserialize_ones(bitmap, offset, count, false);
        } else {
-            ret = bdrv_pread(bs->file, entry << BDRV_SECTOR_BITS, buf,
+            ret = bdrv_pread(bs->file, entry << BDRV_SECTOR_BITS,
-                             s->cluster_size);
+                             s->cluster_size, buf, 0);
            if (ret < 0) {
                error_setg_errno(errp, -ret,
                                 "Failed to read bitmap data cluster");
@ -286,7 +286,7 @@ int parallels_read_format_extension(BlockDriverState *bs,
    assert(ext_off > 0);
-    ret = bdrv_pread(bs->file, ext_off, ext_cluster, s->cluster_size);
+    ret = bdrv_pread(bs->file, ext_off, s->cluster_size, ext_cluster, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Failed to read Format Extension cluster");
        goto out;
--- a/block/parallels.c
+++ b/block/parallels.c
@ -277,8 +277,8 @@ static coroutine_fn int parallels_co_flush_to_os(BlockDriverState *bs)
        if (off + to_write > s->header_size) {
            to_write = s->header_size - off;
        }
-        ret = bdrv_pwrite(bs->file, off, (uint8_t *)s->header + off,
+        ret = bdrv_pwrite(bs->file, off, to_write, (uint8_t *)s->header + off,
-                          to_write);
+                          0);
        if (ret < 0) {
            qemu_co_mutex_unlock(&s->lock);
            return ret;
@ -481,7 +481,7 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs,
    ret = 0;
    if (flush_bat) {
-        ret = bdrv_pwrite_sync(bs->file, 0, s->header, s->header_size);
+        ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0);
        if (ret < 0) {
            res->check_errors++;
            goto out;
@ -599,7 +599,7 @@ static int coroutine_fn parallels_co_create(BlockdevCreateOptions* opts,
    memset(tmp, 0, sizeof(tmp));
    memcpy(tmp, &header, sizeof(header));
-    ret = blk_pwrite(blk, 0, tmp, BDRV_SECTOR_SIZE, 0);
+    ret = blk_pwrite(blk, 0, BDRV_SECTOR_SIZE, tmp, 0);
    if (ret < 0) {
        goto exit;
    }
@ -723,7 +723,7 @@ static int parallels_update_header(BlockDriverState *bs)
    if (size > s->header_size) {
        size = s->header_size;
    }
-    return bdrv_pwrite_sync(bs->file, 0, s->header, size);
+    return bdrv_pwrite_sync(bs->file, 0, size, s->header, 0);
 }
 static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
@ -742,7 +742,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
        return -EINVAL;
    }
-    ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
+    ret = bdrv_pread(bs->file, 0, sizeof(ph), &ph, 0);
    if (ret < 0) {
        goto fail;
    }
@ -798,7 +798,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
        s->header_size = size;
    }
-    ret = bdrv_pread(bs->file, 0, s->header, s->header_size);
+    ret = bdrv_pread(bs->file, 0, s->header_size, s->header, 0);
    if (ret < 0) {
        goto fail;
    }
--- a/block/qcow.c
+++ b/block/qcow.c
@ -128,7 +128,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
        goto fail;
    }
-    ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
+    ret = bdrv_pread(bs->file, 0, sizeof(header), &header, 0);
    if (ret < 0) {
        goto fail;
    }
@ -260,8 +260,8 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
        goto fail;
    }
-    ret = bdrv_pread(bs->file, s->l1_table_offset, s->l1_table,
+    ret = bdrv_pread(bs->file, s->l1_table_offset,
-               s->l1_size * sizeof(uint64_t));
+                     s->l1_size * sizeof(uint64_t), s->l1_table, 0);
    if (ret < 0) {
        goto fail;
    }
@ -291,8 +291,8 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
            ret = -EINVAL;
            goto fail;
        }
-        ret = bdrv_pread(bs->file, header.backing_file_offset,
+        ret = bdrv_pread(bs->file, header.backing_file_offset, len,
-                   bs->auto_backing_file, len);
+                         bs->auto_backing_file, 0);
        if (ret < 0) {
            goto fail;
        }
@ -383,7 +383,7 @@ static int get_cluster_offset(BlockDriverState *bs,
        BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
        ret = bdrv_pwrite_sync(bs->file,
                               s->l1_table_offset + l1_index * sizeof(tmp),
-                               &tmp, sizeof(tmp));
+                               sizeof(tmp), &tmp, 0);
        if (ret < 0) {
            return ret;
        }
@ -414,14 +414,14 @@ static int get_cluster_offset(BlockDriverState *bs,
    BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
    if (new_l2_table) {
        memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
-        ret = bdrv_pwrite_sync(bs->file, l2_offset, l2_table,
+        ret = bdrv_pwrite_sync(bs->file, l2_offset,
-                               s->l2_size * sizeof(uint64_t));
+                               s->l2_size * sizeof(uint64_t), l2_table, 0);
        if (ret < 0) {
            return ret;
        }
    } else {
-        ret = bdrv_pread(bs->file, l2_offset, l2_table,
+        ret = bdrv_pread(bs->file, l2_offset, s->l2_size * sizeof(uint64_t),
-                         s->l2_size * sizeof(uint64_t));
+                         l2_table, 0);
        if (ret < 0) {
            return ret;
        }
@ -453,8 +453,8 @@ static int get_cluster_offset(BlockDriverState *bs,
            cluster_offset = QEMU_ALIGN_UP(cluster_offset, s->cluster_size);
            /* write the cluster content */
            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-            ret = bdrv_pwrite(bs->file, cluster_offset, s->cluster_cache,
+            ret = bdrv_pwrite(bs->file, cluster_offset, s->cluster_size,
-                              s->cluster_size);
+                              s->cluster_cache, 0);
            if (ret < 0) {
                return ret;
            }
@ -492,10 +492,9 @@ static int get_cluster_offset(BlockDriverState *bs,
                                return -EIO;
                            }
                            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-                            ret = bdrv_pwrite(bs->file,
+                            ret = bdrv_pwrite(bs->file, cluster_offset + i,
-                                              cluster_offset + i,
+                                              BDRV_SECTOR_SIZE,
-                                              s->cluster_data,
+                                              s->cluster_data, 0);
                                              BDRV_SECTOR_SIZE);
                            if (ret < 0) {
                                return ret;
                            }
@ -516,7 +515,7 @@ static int get_cluster_offset(BlockDriverState *bs,
            BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
        }
        ret = bdrv_pwrite_sync(bs->file, l2_offset + l2_index * sizeof(tmp),
-                               &tmp, sizeof(tmp));
+                               sizeof(tmp), &tmp, 0);
        if (ret < 0) {
            return ret;
        }
@ -597,8 +596,8 @@ static int decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
        csize = cluster_offset >> (63 - s->cluster_bits);
        csize &= (s->cluster_size - 1);
        BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
-        ret = bdrv_pread(bs->file, coffset, s->cluster_data, csize);
+        ret = bdrv_pread(bs->file, coffset, csize, s->cluster_data, 0);
-        if (ret != csize)
+        if (ret < 0)
            return -1;
        if (decompress_buffer(s->cluster_cache, s->cluster_size,
                              s->cluster_data, csize) < 0) {
@ -891,15 +890,15 @@ static int coroutine_fn qcow_co_create(BlockdevCreateOptions *opts,
    }
    /* write all the data */
-    ret = blk_pwrite(qcow_blk, 0, &header, sizeof(header), 0);
+    ret = blk_pwrite(qcow_blk, 0, sizeof(header), &header, 0);
-    if (ret != sizeof(header)) {
+    if (ret < 0) {
        goto exit;
    }
    if (qcow_opts->has_backing_file) {
-        ret = blk_pwrite(qcow_blk, sizeof(header),
+        ret = blk_pwrite(qcow_blk, sizeof(header), backing_filename_len,
-                         qcow_opts->backing_file, backing_filename_len, 0);
+                         qcow_opts->backing_file, 0);
-        if (ret != backing_filename_len) {
+        if (ret < 0) {
            goto exit;
        }
    }
@ -908,8 +907,8 @@ static int coroutine_fn qcow_co_create(BlockdevCreateOptions *opts,
    for (i = 0; i < DIV_ROUND_UP(sizeof(uint64_t) * l1_size, BDRV_SECTOR_SIZE);
         i++) {
        ret = blk_pwrite(qcow_blk, header_size + BDRV_SECTOR_SIZE * i,
-                         tmp, BDRV_SECTOR_SIZE, 0);
+                         BDRV_SECTOR_SIZE, tmp, 0);
-        if (ret != BDRV_SECTOR_SIZE) {
+        if (ret < 0) {
            g_free(tmp);
            goto exit;
        }
@ -1030,8 +1029,8 @@ static int qcow_make_empty(BlockDriverState *bs)
    int ret;
    memset(s->l1_table, 0, l1_length);
-    if (bdrv_pwrite_sync(bs->file, s->l1_table_offset, s->l1_table,
+    if (bdrv_pwrite_sync(bs->file, s->l1_table_offset, l1_length, s->l1_table,
-            l1_length) < 0)
+                         0) < 0)
        return -1;
    ret = bdrv_truncate(bs->file, s->l1_table_offset + l1_length, false,
                        PREALLOC_MODE_OFF, 0, NULL);
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@ -234,8 +234,8 @@ static int bitmap_table_load(BlockDriverState *bs, Qcow2BitmapTable *tb,
    }
    assert(tb->size <= BME_MAX_TABLE_SIZE);
-    ret = bdrv_pread(bs->file, tb->offset,
+    ret = bdrv_pread(bs->file, tb->offset, tb->size * BME_TABLE_ENTRY_SIZE,
-                     table, tb->size * BME_TABLE_ENTRY_SIZE);
+                     table, 0);
    if (ret < 0) {
        goto fail;
    }
@ -317,7 +317,7 @@ static int load_bitmap_data(BlockDriverState *bs,
                 * already cleared */
            }
        } else {
-            ret = bdrv_pread(bs->file, data_offset, buf, s->cluster_size);
+            ret = bdrv_pread(bs->file, data_offset, s->cluster_size, buf, 0);
            if (ret < 0) {
                goto finish;
            }
@ -575,7 +575,7 @@ static Qcow2BitmapList *bitmap_list_load(BlockDriverState *bs, uint64_t offset,
    }
    dir_end = dir + size;
-    ret = bdrv_pread(bs->file, offset, dir, size);
+    ret = bdrv_pread(bs->file, offset, size, dir, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Failed to read bitmap directory");
        goto fail;
@ -798,7 +798,7 @@ static int bitmap_list_store(BlockDriverState *bs, Qcow2BitmapList *bm_list,
        goto fail;
    }
-    ret = bdrv_pwrite(bs->file, dir_offset, dir, dir_size);
+    ret = bdrv_pwrite(bs->file, dir_offset, dir_size, dir, 0);
    if (ret < 0) {
        goto fail;
    }
@ -1339,7 +1339,7 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
            goto fail;
        }
-        ret = bdrv_pwrite(bs->file, off, buf, s->cluster_size);
+        ret = bdrv_pwrite(bs->file, off, s->cluster_size, buf, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to write bitmap '%s' to file",
                             bm_name);
@ -1402,7 +1402,7 @@ static int store_bitmap(BlockDriverState *bs, Qcow2Bitmap *bm, Error **errp)
    }
    bitmap_table_to_be(tb, tb_size);
-    ret = bdrv_pwrite(bs->file, tb_offset, tb, tb_size * sizeof(tb[0]));
+    ret = bdrv_pwrite(bs->file, tb_offset, tb_size * sizeof(tb[0]), tb, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Failed to write bitmap '%s' to file",
                         bm_name);
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@ -223,8 +223,8 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
    }
-    ret = bdrv_pwrite(bs->file, c->entries[i].offset,
+    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->table_size,
-                      qcow2_cache_get_table_addr(c, i), c->table_size);
+                      qcow2_cache_get_table_addr(c, i), 0);
    if (ret < 0) {
        return ret;
    }
@ -379,9 +379,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
        }
-        ret = bdrv_pread(bs->file, offset,
+        ret = bdrv_pread(bs->file, offset, c->table_size,
-                         qcow2_cache_get_table_addr(c, i),
+                         qcow2_cache_get_table_addr(c, i), 0);
                         c->table_size);
        if (ret < 0) {
            return ret;
        }
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@ -159,8 +159,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
    BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
    for(i = 0; i < s->l1_size; i++)
        new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
-    ret = bdrv_pwrite_sync(bs->file, new_l1_table_offset,
+    ret = bdrv_pwrite_sync(bs->file, new_l1_table_offset, new_l1_size2,
-                           new_l1_table, new_l1_size2);
+                           new_l1_table, 0);
    if (ret < 0)
        goto fail;
    for(i = 0; i < s->l1_size; i++)
@ -171,7 +171,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
    stl_be_p(data, new_l1_size);
    stq_be_p(data + 4, new_l1_table_offset);
    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size),
-                           data, sizeof(data));
+                           sizeof(data), data, 0);
    if (ret < 0) {
        goto fail;
    }
@ -249,7 +249,7 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
    BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
    ret = bdrv_pwrite_sync(bs->file,
                           s->l1_table_offset + L1E_SIZE * l1_start_index,
-                           buf, bufsize);
+                           bufsize, buf, 0);
    if (ret < 0) {
        return ret;
    }
@ -2260,7 +2260,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                                      (void **)&l2_slice);
            } else {
                /* load inactive L2 tables from disk */
-                ret = bdrv_pread(bs->file, slice_offset, l2_slice, slice_size2);
+                ret = bdrv_pread(bs->file, slice_offset, slice_size2,
                                 l2_slice, 0);
            }
            if (ret < 0) {
                goto fail;
@ -2376,8 +2377,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                        goto fail;
                    }
-                    ret = bdrv_pwrite(bs->file, slice_offset,
+                    ret = bdrv_pwrite(bs->file, slice_offset, slice_size2,
-                                      l2_slice, slice_size2);
+                                      l2_slice, 0);
                    if (ret < 0) {
                        goto fail;
                    }
@ -2470,8 +2471,8 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
        l1_table = new_l1_table;
-        ret = bdrv_pread(bs->file, s->snapshots[i].l1_table_offset,
+        ret = bdrv_pread(bs->file, s->snapshots[i].l1_table_offset, l1_size2,
-                         l1_table, l1_size2);
+                         l1_table, 0);
        if (ret < 0) {
            goto fail;
        }
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@ -119,7 +119,7 @@ int qcow2_refcount_init(BlockDriverState *bs)
        }
        BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_LOAD);
        ret = bdrv_pread(bs->file, s->refcount_table_offset,
-                         s->refcount_table, refcount_table_size2);
+                         refcount_table_size2, s->refcount_table, 0);
        if (ret < 0) {
            goto fail;
        }
@ -439,7 +439,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_HOOKUP);
        ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset +
                               refcount_table_index * REFTABLE_ENTRY_SIZE,
-            &data64, sizeof(data64));
+            sizeof(data64), &data64, 0);
        if (ret < 0) {
            goto fail;
        }
@ -684,8 +684,8 @@ int64_t qcow2_refcount_area(BlockDriverState *bs, uint64_t start_offset,
    }
    BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE_TABLE);
-    ret = bdrv_pwrite_sync(bs->file, table_offset, new_table,
+    ret = bdrv_pwrite_sync(bs->file, table_offset,
-        table_size * REFTABLE_ENTRY_SIZE);
+                           table_size * REFTABLE_ENTRY_SIZE, new_table, 0);
    if (ret < 0) {
        goto fail;
    }
@ -704,7 +704,7 @@ int64_t qcow2_refcount_area(BlockDriverState *bs, uint64_t start_offset,
    BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_SWITCH_TABLE);
    ret = bdrv_pwrite_sync(bs->file,
                           offsetof(QCowHeader, refcount_table_offset),
-                           &data, sizeof(data));
+                           sizeof(data), &data, 0);
    if (ret < 0) {
        goto fail;
    }
@ -1274,7 +1274,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
        }
        l1_allocated = true;
-        ret = bdrv_pread(bs->file, l1_table_offset, l1_table, l1_size2);
+        ret = bdrv_pread(bs->file, l1_table_offset, l1_size2, l1_table, 0);
        if (ret < 0) {
            goto fail;
        }
@ -1435,8 +1435,8 @@ fail:
            cpu_to_be64s(&l1_table[i]);
        }
-        ret = bdrv_pwrite_sync(bs->file, l1_table_offset,
+        ret = bdrv_pwrite_sync(bs->file, l1_table_offset, l1_size2, l1_table,
-                               l1_table, l1_size2);
+                               0);
        for (i = 0; i < l1_size; i++) {
            be64_to_cpus(&l1_table[i]);
@ -1633,8 +1633,8 @@ static int fix_l2_entry_by_zero(BlockDriverState *bs, BdrvCheckResult *res,
        goto fail;
    }
-    ret = bdrv_pwrite_sync(bs->file, l2e_offset, &l2_table[idx],
+    ret = bdrv_pwrite_sync(bs->file, l2e_offset, l2_entry_size(s),
-                           l2_entry_size(s));
+                           &l2_table[idx], 0);
    if (ret < 0) {
        fprintf(stderr, "ERROR: Failed to overwrite L2 "
                "table entry: %s\n", strerror(-ret));
@ -1672,7 +1672,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
    bool metadata_overlap;
    /* Read L2 table from disk */
-    ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size_bytes);
+    ret = bdrv_pread(bs->file, l2_offset, l2_size_bytes, l2_table, 0);
    if (ret < 0) {
        fprintf(stderr, "ERROR: I/O error in check_refcounts_l2\n");
        res->check_errors++;
@ -1888,7 +1888,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
    }
    /* Read L1 table entries from disk */
-    ret = bdrv_pread(bs->file, l1_table_offset, l1_table, l1_size_bytes);
+    ret = bdrv_pread(bs->file, l1_table_offset, l1_size_bytes, l1_table, 0);
    if (ret < 0) {
        fprintf(stderr, "ERROR: I/O error in check_refcounts_l1\n");
        res->check_errors++;
@ -2004,8 +2004,8 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
            }
        }
-        ret = bdrv_pread(bs->file, l2_offset, l2_table,
+        ret = bdrv_pread(bs->file, l2_offset, s->l2_size * l2_entry_size(s),
-                         s->l2_size * l2_entry_size(s));
+                         l2_table, 0);
        if (ret < 0) {
            fprintf(stderr, "ERROR: Could not read L2 table: %s\n",
                    strerror(-ret));
@ -2058,8 +2058,8 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                goto fail;
            }
-            ret = bdrv_pwrite(bs->file, l2_offset, l2_table,
+            ret = bdrv_pwrite(bs->file, l2_offset, s->cluster_size, l2_table,
-                              s->cluster_size);
+                              0);
            if (ret < 0) {
                fprintf(stderr, "ERROR: Could not write L2 table: %s\n",
                        strerror(-ret));
@ -2577,8 +2577,8 @@ static int rebuild_refcounts_write_refblocks(
        on_disk_refblock = (void *)((char *) *refcount_table +
                                    refblock_index * s->cluster_size);
-        ret = bdrv_pwrite(bs->file, refblock_offset, on_disk_refblock,
+        ret = bdrv_pwrite(bs->file, refblock_offset, s->cluster_size,
-                          s->cluster_size);
+                          on_disk_refblock, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "ERROR writing refblock");
            return ret;
@ -2733,8 +2733,8 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
    }
    assert(reftable_length < INT_MAX);
-    ret = bdrv_pwrite(bs->file, reftable_offset, on_disk_reftable,
+    ret = bdrv_pwrite(bs->file, reftable_offset, reftable_length,
-                      reftable_length);
+                      on_disk_reftable, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "ERROR writing reftable");
        goto fail;
@ -2746,8 +2746,8 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
        cpu_to_be32(reftable_clusters);
    ret = bdrv_pwrite_sync(bs->file,
                           offsetof(QCowHeader, refcount_table_offset),
-                           &reftable_offset_and_clusters,
+                           sizeof(reftable_offset_and_clusters),
-                           sizeof(reftable_offset_and_clusters));
+                           &reftable_offset_and_clusters, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "ERROR setting reftable");
        goto fail;
@ -3009,7 +3009,7 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
                return -ENOMEM;
            }
-            ret = bdrv_pread(bs->file, l1_ofs, l1, l1_sz2);
+            ret = bdrv_pread(bs->file, l1_ofs, l1_sz2, l1, 0);
            if (ret < 0) {
                g_free(l1);
                return ret;
@ -3180,7 +3180,7 @@ static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
            return ret;
        }
-        ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size);
+        ret = bdrv_pwrite(bs->file, offset, s->cluster_size, refblock, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to write refblock");
            return ret;
@ -3452,8 +3452,9 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
        cpu_to_be64s(&new_reftable[i]);
    }
-    ret = bdrv_pwrite(bs->file, new_reftable_offset, new_reftable,
+    ret = bdrv_pwrite(bs->file, new_reftable_offset,
-                      new_reftable_size * REFTABLE_ENTRY_SIZE);
+                      new_reftable_size * REFTABLE_ENTRY_SIZE, new_reftable,
                      0);
    for (i = 0; i < new_reftable_size; i++) {
        be64_to_cpus(&new_reftable[i]);
@ -3656,8 +3657,9 @@ int qcow2_shrink_reftable(BlockDriverState *bs)
        reftable_tmp[i] = unused_block ? 0 : cpu_to_be64(s->refcount_table[i]);
    }
-    ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset, reftable_tmp,
+    ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset,
-                           s->refcount_table_size * REFTABLE_ENTRY_SIZE);
+                           s->refcount_table_size * REFTABLE_ENTRY_SIZE,
                           reftable_tmp, 0);
    /*
     * If the write in the reftable failed the image may contain a partially
     * overwritten reftable. In this case it would be better to clear the
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@ -108,7 +108,7 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
        /* Read statically sized part of the snapshot header */
        offset = ROUND_UP(offset, 8);
-        ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
+        ret = bdrv_pread(bs->file, offset, sizeof(h), &h, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to read snapshot table");
            goto fail;
@ -146,8 +146,8 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
        }
        /* Read known extra data */
-        ret = bdrv_pread(bs->file, offset, &extra,
+        ret = bdrv_pread(bs->file, offset,
-                         MIN(sizeof(extra), sn->extra_data_size));
+                         MIN(sizeof(extra), sn->extra_data_size), &extra, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to read snapshot table");
            goto fail;
@ -184,8 +184,8 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
            /* Store unknown extra data */
            unknown_extra_data_size = sn->extra_data_size - sizeof(extra);
            sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
-            ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
+            ret = bdrv_pread(bs->file, offset, unknown_extra_data_size,
-                             unknown_extra_data_size);
+                             sn->unknown_extra_data, 0);
            if (ret < 0) {
                error_setg_errno(errp, -ret,
                                 "Failed to read snapshot table");
@ -196,7 +196,7 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
        /* Read snapshot ID */
        sn->id_str = g_malloc(id_str_size + 1);
-        ret = bdrv_pread(bs->file, offset, sn->id_str, id_str_size);
+        ret = bdrv_pread(bs->file, offset, id_str_size, sn->id_str, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to read snapshot table");
            goto fail;
@ -206,7 +206,7 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
        /* Read snapshot name */
        sn->name = g_malloc(name_size + 1);
-        ret = bdrv_pread(bs->file, offset, sn->name, name_size);
+        ret = bdrv_pread(bs->file, offset, name_size, sn->name, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to read snapshot table");
            goto fail;
@ -349,13 +349,13 @@ int qcow2_write_snapshots(BlockDriverState *bs)
        h.name_size = cpu_to_be16(name_size);
        offset = ROUND_UP(offset, 8);
-        ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
+        ret = bdrv_pwrite(bs->file, offset, sizeof(h), &h, 0);
        if (ret < 0) {
            goto fail;
        }
        offset += sizeof(h);
-        ret = bdrv_pwrite(bs->file, offset, &extra, sizeof(extra));
+        ret = bdrv_pwrite(bs->file, offset, sizeof(extra), &extra, 0);
        if (ret < 0) {
            goto fail;
        }
@ -369,21 +369,21 @@ int qcow2_write_snapshots(BlockDriverState *bs)
            assert(unknown_extra_data_size <= BDRV_REQUEST_MAX_BYTES);
            assert(sn->unknown_extra_data);
-            ret = bdrv_pwrite(bs->file, offset, sn->unknown_extra_data,
+            ret = bdrv_pwrite(bs->file, offset, unknown_extra_data_size,
-                              unknown_extra_data_size);
+                              sn->unknown_extra_data, 0);
            if (ret < 0) {
                goto fail;
            }
            offset += unknown_extra_data_size;
        }
-        ret = bdrv_pwrite(bs->file, offset, sn->id_str, id_str_size);
+        ret = bdrv_pwrite(bs->file, offset, id_str_size, sn->id_str, 0);
        if (ret < 0) {
            goto fail;
        }
        offset += id_str_size;
-        ret = bdrv_pwrite(bs->file, offset, sn->name, name_size);
+        ret = bdrv_pwrite(bs->file, offset, name_size, sn->name, 0);
        if (ret < 0) {
            goto fail;
        }
@ -406,7 +406,7 @@ int qcow2_write_snapshots(BlockDriverState *bs)
    header_data.snapshots_offset    = cpu_to_be64(snapshots_offset);
    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, nb_snapshots),
-                           &header_data, sizeof(header_data));
+                           sizeof(header_data), &header_data, 0);
    if (ret < 0) {
        goto fail;
    }
@ -442,7 +442,8 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
    /* qcow2_do_open() discards this information in check mode */
    ret = bdrv_pread(bs->file, offsetof(QCowHeader, nb_snapshots),
-                     &snapshot_table_pointer, sizeof(snapshot_table_pointer));
+                     sizeof(snapshot_table_pointer), &snapshot_table_pointer,
                     0);
    if (ret < 0) {
        result->check_errors++;
        fprintf(stderr, "ERROR failed to read the snapshot table pointer from "
@ -511,9 +512,9 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
        assert(fix & BDRV_FIX_ERRORS);
        snapshot_table_pointer.nb_snapshots = cpu_to_be32(s->nb_snapshots);
-        ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, nb_snapshots),
+        ret = bdrv_co_pwrite_sync(bs->file, offsetof(QCowHeader, nb_snapshots),
-                               &snapshot_table_pointer.nb_snapshots,
+                                  sizeof(snapshot_table_pointer.nb_snapshots),
-                               sizeof(snapshot_table_pointer.nb_snapshots));
+                                  &snapshot_table_pointer.nb_snapshots, 0);
        if (ret < 0) {
            result->check_errors++;
            fprintf(stderr, "ERROR failed to update the snapshot count in the "
@ -693,8 +694,8 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
        goto fail;
    }
-    ret = bdrv_pwrite(bs->file, sn->l1_table_offset, l1_table,
+    ret = bdrv_pwrite(bs->file, sn->l1_table_offset, s->l1_size * L1E_SIZE,
-                      s->l1_size * L1E_SIZE);
+                      l1_table, 0);
    if (ret < 0) {
        goto fail;
    }
@ -829,8 +830,8 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
        goto fail;
    }
-    ret = bdrv_pread(bs->file, sn->l1_table_offset,
+    ret = bdrv_pread(bs->file, sn->l1_table_offset, sn_l1_bytes, sn_l1_table,
-                     sn_l1_table, sn_l1_bytes);
+                     0);
    if (ret < 0) {
        goto fail;
    }
@ -848,8 +849,8 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
        goto fail;
    }
-    ret = bdrv_pwrite_sync(bs->file, s->l1_table_offset, sn_l1_table,
+    ret = bdrv_pwrite_sync(bs->file, s->l1_table_offset, cur_l1_bytes,
-                           cur_l1_bytes);
+                           sn_l1_table, 0);
    if (ret < 0) {
        goto fail;
    }
@ -1051,8 +1052,8 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
        return -ENOMEM;
    }
-    ret = bdrv_pread(bs->file, sn->l1_table_offset,
+    ret = bdrv_pread(bs->file, sn->l1_table_offset, new_l1_bytes,
-                     new_l1_table, new_l1_bytes);
+                     new_l1_table, 0);
    if (ret < 0) {
        error_setg(errp, "Failed to read l1 table for snapshot");
        qemu_vfree(new_l1_table);
--- a/block/qcow2.c
+++ b/block/qcow2.c
@ -94,9 +94,9 @@ static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 }
-static ssize_t qcow2_crypto_hdr_read_func(QCryptoBlock *block, size_t offset,
+static int qcow2_crypto_hdr_read_func(QCryptoBlock *block, size_t offset,
-                                          uint8_t *buf, size_t buflen,
+                                      uint8_t *buf, size_t buflen,
-                                          void *opaque, Error **errp)
+                                      void *opaque, Error **errp)
 {
    BlockDriverState *bs = opaque;
    BDRVQcow2State *s = bs->opaque;
@ -107,18 +107,18 @@ static ssize_t qcow2_crypto_hdr_read_func(QCryptoBlock *block, size_t offset,
        return -1;
    }
-    ret = bdrv_pread(bs->file,
+    ret = bdrv_pread(bs->file, s->crypto_header.offset + offset, buflen, buf,
-                     s->crypto_header.offset + offset, buf, buflen);
+                     0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not read encryption header");
        return -1;
    }
-    return ret;
+    return 0;
 }
-static ssize_t qcow2_crypto_hdr_init_func(QCryptoBlock *block, size_t headerlen,
+static int qcow2_crypto_hdr_init_func(QCryptoBlock *block, size_t headerlen,
-                                          void *opaque, Error **errp)
+                                      void *opaque, Error **errp)
 {
    BlockDriverState *bs = opaque;
    BDRVQcow2State *s = bs->opaque;
@ -151,13 +151,13 @@ static ssize_t qcow2_crypto_hdr_init_func(QCryptoBlock *block, size_t headerlen,
        return -1;
    }
-    return ret;
+    return 0;
 }
-static ssize_t qcow2_crypto_hdr_write_func(QCryptoBlock *block, size_t offset,
+static int qcow2_crypto_hdr_write_func(QCryptoBlock *block, size_t offset,
-                                           const uint8_t *buf, size_t buflen,
+                                       const uint8_t *buf, size_t buflen,
-                                           void *opaque, Error **errp)
+                                       void *opaque, Error **errp)
 {
    BlockDriverState *bs = opaque;
    BDRVQcow2State *s = bs->opaque;
@ -168,13 +168,13 @@ static ssize_t qcow2_crypto_hdr_write_func(QCryptoBlock *block, size_t offset,
        return -1;
    }
-    ret = bdrv_pwrite(bs->file,
+    ret = bdrv_pwrite(bs->file, s->crypto_header.offset + offset, buflen, buf,
-                      s->crypto_header.offset + offset, buf, buflen);
+                      0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not read encryption header");
        return -1;
    }
-    return ret;
+    return 0;
 }
 static QDict*
@ -227,7 +227,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
        printf("attempting to read extended header in offset %lu\n", offset);
 #endif
-        ret = bdrv_pread(bs->file, offset, &ext, sizeof(ext));
+        ret = bdrv_pread(bs->file, offset, sizeof(ext), &ext, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "qcow2_read_extension: ERROR: "
                             "pread fail from offset %" PRIu64, offset);
@ -255,7 +255,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
                           sizeof(bs->backing_format));
                return 2;
            }
-            ret = bdrv_pread(bs->file, offset, bs->backing_format, ext.len);
+            ret = bdrv_pread(bs->file, offset, ext.len, bs->backing_format, 0);
            if (ret < 0) {
                error_setg_errno(errp, -ret, "ERROR: ext_backing_format: "
                                 "Could not read format name");
@ -271,7 +271,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
        case QCOW2_EXT_MAGIC_FEATURE_TABLE:
            if (p_feature_table != NULL) {
                void *feature_table = g_malloc0(ext.len + 2 * sizeof(Qcow2Feature));
-                ret = bdrv_pread(bs->file, offset , feature_table, ext.len);
+                ret = bdrv_pread(bs->file, offset, ext.len, feature_table, 0);
                if (ret < 0) {
                    error_setg_errno(errp, -ret, "ERROR: ext_feature_table: "
                                     "Could not read table");
@ -296,7 +296,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
                return -EINVAL;
            }
-            ret = bdrv_pread(bs->file, offset, &s->crypto_header, ext.len);
+            ret = bdrv_pread(bs->file, offset, ext.len, &s->crypto_header, 0);
            if (ret < 0) {
                error_setg_errno(errp, -ret,
                                 "Unable to read CRYPTO header extension");
@ -352,7 +352,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
                break;
            }
-            ret = bdrv_pread(bs->file, offset, &bitmaps_ext, ext.len);
+            ret = bdrv_pread(bs->file, offset, ext.len, &bitmaps_ext, 0);
            if (ret < 0) {
                error_setg_errno(errp, -ret, "bitmaps_ext: "
                                 "Could not read ext header");
@ -416,7 +416,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
        case QCOW2_EXT_MAGIC_DATA_FILE:
        {
            s->image_data_file = g_malloc0(ext.len + 1);
-            ret = bdrv_pread(bs->file, offset, s->image_data_file, ext.len);
+            ret = bdrv_pread(bs->file, offset, ext.len, s->image_data_file, 0);
            if (ret < 0) {
                error_setg_errno(errp, -ret,
                                 "ERROR: Could not read data file name");
@ -440,7 +440,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
                uext->len = ext.len;
                QLIST_INSERT_HEAD(&s->unknown_header_ext, uext, next);
-                ret = bdrv_pread(bs->file, offset , uext->data, uext->len);
+                ret = bdrv_pread(bs->file, offset, uext->len, uext->data, 0);
                if (ret < 0) {
                    error_setg_errno(errp, -ret, "ERROR: unknown extension: "
                                     "Could not read data");
@ -516,12 +516,9 @@ int qcow2_mark_dirty(BlockDriverState *bs)
    }
    val = cpu_to_be64(s->incompatible_features | QCOW2_INCOMPAT_DIRTY);
-    ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, incompatible_features),
+    ret = bdrv_pwrite_sync(bs->file,
-                      &val, sizeof(val));
+                           offsetof(QCowHeader, incompatible_features),
-    if (ret < 0) {
+                           sizeof(val), &val, 0);
        return ret;
    }
    ret = bdrv_flush(bs->file->bs);
    if (ret < 0) {
        return ret;
    }
@ -1308,7 +1305,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
    uint64_t l1_vm_state_index;
    bool update_header = false;
-    ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
+    ret = bdrv_pread(bs->file, 0, sizeof(header), &header, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not read qcow2 header");
        goto fail;
@ -1384,8 +1381,9 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
    if (header.header_length > sizeof(header)) {
        s->unknown_header_fields_size = header.header_length - sizeof(header);
        s->unknown_header_fields = g_malloc(s->unknown_header_fields_size);
-        ret = bdrv_pread(bs->file, sizeof(header), s->unknown_header_fields,
+        ret = bdrv_pread(bs->file, sizeof(header),
-                         s->unknown_header_fields_size);
+                         s->unknown_header_fields_size,
                         s->unknown_header_fields, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Could not read unknown qcow2 header "
                             "fields");
@ -1580,8 +1578,8 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
            ret = -ENOMEM;
            goto fail;
        }
-        ret = bdrv_pread(bs->file, s->l1_table_offset, s->l1_table,
+        ret = bdrv_pread(bs->file, s->l1_table_offset, s->l1_size * L1E_SIZE,
-                         s->l1_size * L1E_SIZE);
+                         s->l1_table, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Could not read L1 table");
            goto fail;
@ -1698,8 +1696,8 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
            ret = -EINVAL;
            goto fail;
        }
-        ret = bdrv_pread(bs->file, header.backing_file_offset,
+        ret = bdrv_pread(bs->file, header.backing_file_offset, len,
-                         bs->auto_backing_file, len);
+                         bs->auto_backing_file, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Could not read backing file name");
            goto fail;
@ -3081,7 +3079,7 @@ int qcow2_update_header(BlockDriverState *bs)
    }
    /* Write the new header */
-    ret = bdrv_pwrite(bs->file, 0, header, s->cluster_size);
+    ret = bdrv_pwrite(bs->file, 0, s->cluster_size, header, 0);
    if (ret < 0) {
        goto fail;
    }
@ -3668,7 +3666,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
            cpu_to_be64(QCOW2_INCOMPAT_EXTL2);
    }
-    ret = blk_pwrite(blk, 0, header, cluster_size, 0);
+    ret = blk_pwrite(blk, 0, cluster_size, header, 0);
    g_free(header);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not write qcow2 header");
@ -3678,7 +3676,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
    /* Write a refcount table with one refcount block */
    refcount_table = g_malloc0(2 * cluster_size);
    refcount_table[0] = cpu_to_be64(2 * cluster_size);
-    ret = blk_pwrite(blk, cluster_size, refcount_table, 2 * cluster_size, 0);
+    ret = blk_pwrite(blk, cluster_size, 2 * cluster_size, refcount_table, 0);
    g_free(refcount_table);
    if (ret < 0) {
@ -4550,8 +4548,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
    /* write updated header.size */
    offset = cpu_to_be64(offset);
-    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
+    ret = bdrv_co_pwrite_sync(bs->file, offsetof(QCowHeader, size),
-                           &offset, sizeof(offset));
+                              sizeof(offset), &offset, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Failed to update the image size");
        goto fail;
@ -4828,7 +4826,7 @@ static int make_completely_empty(BlockDriverState *bs)
    l1_ofs_rt_ofs_cls.reftable_offset = cpu_to_be64(s->cluster_size);
    l1_ofs_rt_ofs_cls.reftable_clusters = cpu_to_be32(1);
    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_table_offset),
-                           &l1_ofs_rt_ofs_cls, sizeof(l1_ofs_rt_ofs_cls));
+                           sizeof(l1_ofs_rt_ofs_cls), &l1_ofs_rt_ofs_cls, 0);
    if (ret < 0) {
        goto fail_broken_refcounts;
    }
@ -4859,8 +4857,8 @@ static int make_completely_empty(BlockDriverState *bs)
    /* Enter the first refblock into the reftable */
    rt_entry = cpu_to_be64(2 * s->cluster_size);
-    ret = bdrv_pwrite_sync(bs->file, s->cluster_size,
+    ret = bdrv_pwrite_sync(bs->file, s->cluster_size, sizeof(rt_entry),
-                           &rt_entry, sizeof(rt_entry));
+                           &rt_entry, 0);
    if (ret < 0) {
        goto fail_broken_refcounts;
    }
--- a/block/qed.c
+++ b/block/qed.c
@ -87,14 +87,9 @@ static void qed_header_cpu_to_le(const QEDHeader *cpu, QEDHeader *le)
 int qed_write_header_sync(BDRVQEDState *s)
 {
    QEDHeader le;
    int ret;
    qed_header_cpu_to_le(&s->header, &le);
-    ret = bdrv_pwrite(s->bs->file, 0, &le, sizeof(le));
+    return bdrv_pwrite(s->bs->file, 0, sizeof(le), &le, 0);
    if (ret != sizeof(le)) {
        return ret;
    }
    return 0;
 }
 /**
@ -207,7 +202,7 @@ static int qed_read_string(BdrvChild *file, uint64_t offset, size_t n,
    if (n >= buflen) {
        return -EINVAL;
    }
-    ret = bdrv_pread(file, offset, buf, n);
+    ret = bdrv_pread(file, offset, n, buf, 0);
    if (ret < 0) {
        return ret;
    }
@ -392,7 +387,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options,
    int64_t file_size;
    int ret;
-    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
+    ret = bdrv_pread(bs->file, 0, sizeof(le_header), &le_header, 0);
    if (ret < 0) {
        error_setg(errp, "Failed to read QED header");
        return ret;
@ -710,18 +705,18 @@ static int coroutine_fn bdrv_qed_co_create(BlockdevCreateOptions *opts,
    }
    qed_header_cpu_to_le(&header, &le_header);
-    ret = blk_pwrite(blk, 0, &le_header, sizeof(le_header), 0);
+    ret = blk_pwrite(blk, 0, sizeof(le_header), &le_header, 0);
    if (ret < 0) {
        goto out;
    }
-    ret = blk_pwrite(blk, sizeof(le_header), qed_opts->backing_file,
+    ret = blk_pwrite(blk, sizeof(le_header), header.backing_filename_size,
-                     header.backing_filename_size, 0);
+                     qed_opts->backing_file, 0);
    if (ret < 0) {
        goto out;
    }
    l1_table = g_malloc0(l1_size);
-    ret = blk_pwrite(blk, header.l1_table_offset, l1_table, l1_size, 0);
+    ret = blk_pwrite(blk, header.l1_table_offset, l1_size, l1_table, 0);
    if (ret < 0) {
        goto out;
    }
@ -1545,7 +1540,7 @@ static int bdrv_qed_change_backing_file(BlockDriverState *bs,
    }
    /* Write new header */
-    ret = bdrv_pwrite_sync(bs->file, 0, buffer, buffer_len);
+    ret = bdrv_pwrite_sync(bs->file, 0, buffer_len, buffer, 0);
    g_free(buffer);
    if (ret == 0) {
        memcpy(&s->header, &new_header, sizeof(new_header));
--- a/block/rbd.c
+++ b/block/rbd.c
@ -831,6 +831,26 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
        error_setg_errno(errp, -r, "error opening pool %s", opts->pool);
        goto failed_shutdown;
    }
 #ifdef HAVE_RBD_NAMESPACE_EXISTS
    if (opts->has_q_namespace && strlen(opts->q_namespace) > 0) {
        bool exists;
        r = rbd_namespace_exists(*io_ctx, opts->q_namespace, &exists);
        if (r < 0) {
            error_setg_errno(errp, -r, "error checking namespace");
            goto failed_ioctx_destroy;
        }
        if (!exists) {
            error_setg(errp, "namespace '%s' does not exist",
                       opts->q_namespace);
            r = -ENOENT;
            goto failed_ioctx_destroy;
        }
    }
 #endif
    /*
     * Set the namespace after opening the io context on the pool,
     * if nspace == NULL or if nspace == "", it is just as we did nothing
@ -840,6 +860,10 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
    r = 0;
    goto out;
 #ifdef HAVE_RBD_NAMESPACE_EXISTS
 failed_ioctx_destroy:
    rados_ioctx_destroy(*io_ctx);
 #endif
 failed_shutdown:
    rados_shutdown(*cluster);
 out:
--- a/block/trace-events
+++ b/block/trace-events
@ -172,6 +172,8 @@ nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s"
 nbd_co_request_fail(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name, int ret, const char *err) "Request failed { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) } ret = %d, err: %s"
 nbd_client_handshake(const char *export_name) "export '%s'"
 nbd_client_handshake_success(const char *export_name) "export '%s'"
 nbd_reconnect_attempt(unsigned in_flight) "in_flight %u"
 nbd_reconnect_attempt_result(int ret, unsigned in_flight) "ret %d in_flight %u"
 # ssh.c
 ssh_restart_coroutine(void *co) "co=%p"
--- a/block/vdi.c
+++ b/block/vdi.c
@ -385,7 +385,7 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
    logout("\n");
-    ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
+    ret = bdrv_pread(bs->file, 0, sizeof(header), &header, 0);
    if (ret < 0) {
        goto fail;
    }
@ -485,8 +485,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
        goto fail;
    }
-    ret = bdrv_pread(bs->file, header.offset_bmap, s->bmap,
+    ret = bdrv_pread(bs->file, header.offset_bmap, bmap_size * SECTOR_SIZE,
-                     bmap_size * SECTOR_SIZE);
+                     s->bmap, 0);
    if (ret < 0) {
        goto fail_free_bmap;
    }
@ -664,7 +664,7 @@ vdi_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
             * so this full-cluster write does not overlap a partial write
             * of the same cluster, issued from the "else" branch.
             */
-            ret = bdrv_pwrite(bs->file, data_offset, block, s->block_size);
+            ret = bdrv_pwrite(bs->file, data_offset, s->block_size, block, 0);
            qemu_co_rwlock_unlock(&s->bmap_lock);
        } else {
 nonallocating_write:
@ -709,7 +709,7 @@ nonallocating_write:
        assert(VDI_IS_ALLOCATED(bmap_first));
        *header = s->header;
        vdi_header_to_le(header);
-        ret = bdrv_pwrite(bs->file, 0, header, sizeof(*header));
+        ret = bdrv_pwrite(bs->file, 0, sizeof(*header), header, 0);
        g_free(header);
        if (ret < 0) {
@ -726,11 +726,11 @@ nonallocating_write:
        base = ((uint8_t *)&s->bmap[0]) + bmap_first * SECTOR_SIZE;
        logout("will write %u block map sectors starting from entry %u\n",
               n_sectors, bmap_first);
-        ret = bdrv_pwrite(bs->file, offset * SECTOR_SIZE, base,
+        ret = bdrv_pwrite(bs->file, offset * SECTOR_SIZE,
-                          n_sectors * SECTOR_SIZE);
+                          n_sectors * SECTOR_SIZE, base, 0);
    }
-    return ret < 0 ? ret : 0;
+    return ret;
 }
 static int coroutine_fn vdi_co_do_create(BlockdevCreateOptions *create_options,
@ -845,7 +845,7 @@ static int coroutine_fn vdi_co_do_create(BlockdevCreateOptions *create_options,
        vdi_header_print(&header);
    }
    vdi_header_to_le(&header);
-    ret = blk_pwrite(blk, offset, &header, sizeof(header), 0);
+    ret = blk_pwrite(blk, offset, sizeof(header), &header, 0);
    if (ret < 0) {
        error_setg(errp, "Error writing header");
        goto exit;
@ -866,7 +866,7 @@ static int coroutine_fn vdi_co_do_create(BlockdevCreateOptions *create_options,
                bmap[i] = VDI_UNALLOCATED;
            }
        }
-        ret = blk_pwrite(blk, offset, bmap, bmap_size, 0);
+        ret = blk_pwrite(blk, offset, bmap_size, bmap, 0);
        if (ret < 0) {
            error_setg(errp, "Error writing bmap");
            goto exit;
--- a/block/vhdx-log.c
+++ b/block/vhdx-log.c
@ -84,7 +84,7 @@ static int vhdx_log_peek_hdr(BlockDriverState *bs, VHDXLogEntries *log,
    offset = log->offset + read;
-    ret = bdrv_pread(bs->file, offset, hdr, sizeof(VHDXLogEntryHeader));
+    ret = bdrv_pread(bs->file, offset, sizeof(VHDXLogEntryHeader), hdr, 0);
    if (ret < 0) {
        goto exit;
    }
@ -144,7 +144,7 @@ static int vhdx_log_read_sectors(BlockDriverState *bs, VHDXLogEntries *log,
        }
        offset = log->offset + read;
-        ret = bdrv_pread(bs->file, offset, buffer, VHDX_LOG_SECTOR_SIZE);
+        ret = bdrv_pread(bs->file, offset, VHDX_LOG_SECTOR_SIZE, buffer, 0);
        if (ret < 0) {
            goto exit;
        }
@ -194,8 +194,8 @@ static int vhdx_log_write_sectors(BlockDriverState *bs, VHDXLogEntries *log,
            /* full */
            break;
        }
-        ret = bdrv_pwrite(bs->file, offset, buffer_tmp,
+        ret = bdrv_pwrite(bs->file, offset, VHDX_LOG_SECTOR_SIZE, buffer_tmp,
-                          VHDX_LOG_SECTOR_SIZE);
+                          0);
        if (ret < 0) {
            goto exit;
        }
@ -466,8 +466,8 @@ static int vhdx_log_flush_desc(BlockDriverState *bs, VHDXLogDescriptor *desc,
    /* count is only > 1 if we are writing zeroes */
    for (i = 0; i < count; i++) {
-        ret = bdrv_pwrite_sync(bs->file, file_offset, buffer,
+        ret = bdrv_pwrite_sync(bs->file, file_offset, VHDX_LOG_SECTOR_SIZE,
-                               VHDX_LOG_SECTOR_SIZE);
+                               buffer, 0);
        if (ret < 0) {
            goto exit;
        }
@ -970,8 +970,8 @@ static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
        if (i == 0 && leading_length) {
            /* partial sector at the front of the buffer */
-            ret = bdrv_pread(bs->file, file_offset, merged_sector,
+            ret = bdrv_pread(bs->file, file_offset, VHDX_LOG_SECTOR_SIZE,
-                             VHDX_LOG_SECTOR_SIZE);
+                             merged_sector, 0);
            if (ret < 0) {
                goto exit;
            }
@ -980,10 +980,9 @@ static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
            sector_write = merged_sector;
        } else if (i == sectors - 1 && trailing_length) {
            /* partial sector at the end of the buffer */
-            ret = bdrv_pread(bs->file,
+            ret = bdrv_pread(bs->file, file_offset,
-                            file_offset,
+                             VHDX_LOG_SECTOR_SIZE - trailing_length,
-                            merged_sector + trailing_length,
+                             merged_sector + trailing_length, 0);
                            VHDX_LOG_SECTOR_SIZE - trailing_length);
            if (ret < 0) {
                goto exit;
            }
--- a/block/vhdx.c
+++ b/block/vhdx.c
@ -326,7 +326,7 @@ static int vhdx_write_header(BdrvChild *file, VHDXHeader *hdr,
    buffer = qemu_blockalign(bs_file, VHDX_HEADER_SIZE);
    if (read) {
        /* if true, we can't assume the extra reserved bytes are 0 */
-        ret = bdrv_pread(file, offset, buffer, VHDX_HEADER_SIZE);
+        ret = bdrv_pread(file, offset, VHDX_HEADER_SIZE, buffer, 0);
        if (ret < 0) {
            goto exit;
        }
@ -340,7 +340,7 @@ static int vhdx_write_header(BdrvChild *file, VHDXHeader *hdr,
    vhdx_header_le_export(hdr, header_le);
    vhdx_update_checksum(buffer, VHDX_HEADER_SIZE,
                         offsetof(VHDXHeader, checksum));
-    ret = bdrv_pwrite_sync(file, offset, header_le, sizeof(VHDXHeader));
+    ret = bdrv_pwrite_sync(file, offset, sizeof(VHDXHeader), header_le, 0);
 exit:
    qemu_vfree(buffer);
@ -440,8 +440,8 @@ static void vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s,
    /* We have to read the whole VHDX_HEADER_SIZE instead of
     * sizeof(VHDXHeader), because the checksum is over the whole
     * region */
-    ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, buffer,
+    ret = bdrv_pread(bs->file, VHDX_HEADER1_OFFSET, VHDX_HEADER_SIZE, buffer,
-                     VHDX_HEADER_SIZE);
+                     0);
    if (ret < 0) {
        goto fail;
    }
@ -457,8 +457,8 @@ static void vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s,
        }
    }
-    ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer,
+    ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, VHDX_HEADER_SIZE, buffer,
-                     VHDX_HEADER_SIZE);
+                     0);
    if (ret < 0) {
        goto fail;
    }
@ -531,8 +531,8 @@ static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s)
     * whole block */
    buffer = qemu_blockalign(bs, VHDX_HEADER_BLOCK_SIZE);
-    ret = bdrv_pread(bs->file, VHDX_REGION_TABLE_OFFSET, buffer,
+    ret = bdrv_pread(bs->file, VHDX_REGION_TABLE_OFFSET,
-                     VHDX_HEADER_BLOCK_SIZE);
+                     VHDX_HEADER_BLOCK_SIZE, buffer, 0);
    if (ret < 0) {
        goto fail;
    }
@ -644,8 +644,8 @@ static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
    buffer = qemu_blockalign(bs, VHDX_METADATA_TABLE_MAX_SIZE);
-    ret = bdrv_pread(bs->file, s->metadata_rt.file_offset, buffer,
+    ret = bdrv_pread(bs->file, s->metadata_rt.file_offset,
-                     VHDX_METADATA_TABLE_MAX_SIZE);
+                     VHDX_METADATA_TABLE_MAX_SIZE, buffer, 0);
    if (ret < 0) {
        goto exit;
    }
@ -750,8 +750,9 @@ static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
    ret = bdrv_pread(bs->file,
                     s->metadata_entries.file_parameters_entry.offset
                                         + s->metadata_rt.file_offset,
                     sizeof(s->params),
                     &s->params,
-                     sizeof(s->params));
+                     0);
    if (ret < 0) {
        goto exit;
@ -785,24 +786,27 @@ static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
    ret = bdrv_pread(bs->file,
                     s->metadata_entries.virtual_disk_size_entry.offset
                                           + s->metadata_rt.file_offset,
                     sizeof(uint64_t),
                     &s->virtual_disk_size,
-                     sizeof(uint64_t));
+                     0);
    if (ret < 0) {
        goto exit;
    }
    ret = bdrv_pread(bs->file,
                     s->metadata_entries.logical_sector_size_entry.offset
                                             + s->metadata_rt.file_offset,
                     sizeof(uint32_t),
                     &s->logical_sector_size,
-                     sizeof(uint32_t));
+                     0);
    if (ret < 0) {
        goto exit;
    }
    ret = bdrv_pread(bs->file,
                     s->metadata_entries.phys_sector_size_entry.offset
                                          + s->metadata_rt.file_offset,
                     sizeof(uint32_t),
                     &s->physical_sector_size,
-                     sizeof(uint32_t));
+                     0);
    if (ret < 0) {
        goto exit;
    }
@ -1010,7 +1014,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags,
    QLIST_INIT(&s->regions);
    /* validate the file signature */
-    ret = bdrv_pread(bs->file, 0, &signature, sizeof(uint64_t));
+    ret = bdrv_pread(bs->file, 0, sizeof(uint64_t), &signature, 0);
    if (ret < 0) {
        goto fail;
    }
@ -1069,7 +1073,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags,
        goto fail;
    }
-    ret = bdrv_pread(bs->file, s->bat_offset, s->bat, s->bat_rt.length);
+    ret = bdrv_pread(bs->file, s->bat_offset, s->bat_rt.length, s->bat, 0);
    if (ret < 0) {
        goto fail;
    }
@ -1661,13 +1665,13 @@ static int vhdx_create_new_metadata(BlockBackend *blk,
                                   VHDX_META_FLAGS_IS_VIRTUAL_DISK;
    vhdx_metadata_entry_le_export(&md_table_entry[4]);
-    ret = blk_pwrite(blk, metadata_offset, buffer, VHDX_HEADER_BLOCK_SIZE, 0);
+    ret = blk_pwrite(blk, metadata_offset, VHDX_HEADER_BLOCK_SIZE, buffer, 0);
    if (ret < 0) {
        goto exit;
    }
-    ret = blk_pwrite(blk, metadata_offset + (64 * KiB), entry_buffer,
+    ret = blk_pwrite(blk, metadata_offset + (64 * KiB),
-                     VHDX_METADATA_ENTRY_BUFFER_SIZE, 0);
+                     VHDX_METADATA_ENTRY_BUFFER_SIZE, entry_buffer, 0);
    if (ret < 0) {
        goto exit;
    }
@ -1752,7 +1756,7 @@ static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
            s->bat[sinfo.bat_idx] = cpu_to_le64(s->bat[sinfo.bat_idx]);
            sector_num += s->sectors_per_block;
        }
-        ret = blk_pwrite(blk, file_offset, s->bat, length, 0);
+        ret = blk_pwrite(blk, file_offset, length, s->bat, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to write the BAT");
            goto exit;
@ -1856,15 +1860,15 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
    }
    /* Now write out the region headers to disk */
-    ret = blk_pwrite(blk, VHDX_REGION_TABLE_OFFSET, buffer,
+    ret = blk_pwrite(blk, VHDX_REGION_TABLE_OFFSET, VHDX_HEADER_BLOCK_SIZE,
-                     VHDX_HEADER_BLOCK_SIZE, 0);
+                     buffer, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Failed to write first region table");
        goto exit;
    }
-    ret = blk_pwrite(blk, VHDX_REGION_TABLE2_OFFSET, buffer,
+    ret = blk_pwrite(blk, VHDX_REGION_TABLE2_OFFSET, VHDX_HEADER_BLOCK_SIZE,
-                     VHDX_HEADER_BLOCK_SIZE, 0);
+                     buffer, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Failed to write second region table");
        goto exit;
@ -2008,7 +2012,7 @@ static int coroutine_fn vhdx_co_create(BlockdevCreateOptions *opts,
    creator = g_utf8_to_utf16("QEMU v" QEMU_VERSION, -1, NULL,
                              &creator_items, NULL);
    signature = cpu_to_le64(VHDX_FILE_SIGNATURE);
-    ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature),
+    ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, sizeof(signature), &signature,
                     0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Failed to write file signature");
@ -2016,7 +2020,7 @@ static int coroutine_fn vhdx_co_create(BlockdevCreateOptions *opts,
    }
    if (creator) {
        ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET + sizeof(signature),
-                         creator, creator_items * sizeof(gunichar2), 0);
+                         creator_items * sizeof(gunichar2), creator, 0);
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to write creator field");
            goto delete_and_exit;
--- a/block/vmdk.c
+++ b/block/vmdk.c
@ -307,7 +307,7 @@ static int vmdk_read_cid(BlockDriverState *bs, int parent, uint32_t *pcid)
    int ret;
    desc = g_malloc0(DESC_SIZE);
-    ret = bdrv_pread(bs->file, s->desc_offset, desc, DESC_SIZE);
+    ret = bdrv_pread(bs->file, s->desc_offset, DESC_SIZE, desc, 0);
    if (ret < 0) {
        goto out;
    }
@ -348,7 +348,7 @@ static int vmdk_write_cid(BlockDriverState *bs, uint32_t cid)
    desc = g_malloc0(DESC_SIZE);
    tmp_desc = g_malloc0(DESC_SIZE);
-    ret = bdrv_pread(bs->file, s->desc_offset, desc, DESC_SIZE);
+    ret = bdrv_pread(bs->file, s->desc_offset, DESC_SIZE, desc, 0);
    if (ret < 0) {
        goto out;
    }
@ -368,7 +368,7 @@ static int vmdk_write_cid(BlockDriverState *bs, uint32_t cid)
        pstrcat(desc, DESC_SIZE, tmp_desc);
    }
-    ret = bdrv_pwrite_sync(bs->file, s->desc_offset, desc, DESC_SIZE);
+    ret = bdrv_pwrite_sync(bs->file, s->desc_offset, DESC_SIZE, desc, 0);
 out:
    g_free(desc);
@ -469,11 +469,10 @@ static int vmdk_parent_open(BlockDriverState *bs)
    int ret;
    desc = g_malloc0(DESC_SIZE + 1);
-    ret = bdrv_pread(bs->file, s->desc_offset, desc, DESC_SIZE);
+    ret = bdrv_pread(bs->file, s->desc_offset, DESC_SIZE, desc, 0);
    if (ret < 0) {
        goto out;
    }
    ret = 0;
    p_name = strstr(desc, "parentFileNameHint");
    if (p_name != NULL) {
@ -589,10 +588,8 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
        return -ENOMEM;
    }
-    ret = bdrv_pread(extent->file,
+    ret = bdrv_pread(extent->file, extent->l1_table_offset, l1_size,
-                     extent->l1_table_offset,
+                     extent->l1_table, 0);
                     extent->l1_table,
                     l1_size);
    if (ret < 0) {
        bdrv_refresh_filename(extent->file->bs);
        error_setg_errno(errp, -ret,
@ -616,10 +613,8 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
            ret = -ENOMEM;
            goto fail_l1;
        }
-        ret = bdrv_pread(extent->file,
+        ret = bdrv_pread(extent->file, extent->l1_backup_table_offset,
-                         extent->l1_backup_table_offset,
+                         l1_size, extent->l1_backup_table, 0);
                         extent->l1_backup_table,
                         l1_size);
        if (ret < 0) {
            bdrv_refresh_filename(extent->file->bs);
            error_setg_errno(errp, -ret,
@ -651,7 +646,7 @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
    VMDK3Header header;
    VmdkExtent *extent = NULL;
-    ret = bdrv_pread(file, sizeof(magic), &header, sizeof(header));
+    ret = bdrv_pread(file, sizeof(magic), sizeof(header), &header, 0);
    if (ret < 0) {
        bdrv_refresh_filename(file->bs);
        error_setg_errno(errp, -ret,
@ -815,7 +810,7 @@ static int vmdk_open_se_sparse(BlockDriverState *bs,
    assert(sizeof(const_header) == SECTOR_SIZE);
-    ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
+    ret = bdrv_pread(file, 0, sizeof(const_header), &const_header, 0);
    if (ret < 0) {
        bdrv_refresh_filename(file->bs);
        error_setg_errno(errp, -ret,
@ -832,9 +827,8 @@ static int vmdk_open_se_sparse(BlockDriverState *bs,
    assert(sizeof(volatile_header) == SECTOR_SIZE);
-    ret = bdrv_pread(file,
+    ret = bdrv_pread(file, const_header.volatile_header_offset * SECTOR_SIZE,
-                     const_header.volatile_header_offset * SECTOR_SIZE,
+                     sizeof(volatile_header), &volatile_header, 0);
                     &volatile_header, sizeof(volatile_header));
    if (ret < 0) {
        bdrv_refresh_filename(file->bs);
        error_setg_errno(errp, -ret,
@ -904,13 +898,13 @@ static char *vmdk_read_desc(BdrvChild *file, uint64_t desc_offset, Error **errp)
    size = MIN(size, (1 << 20) - 1);  /* avoid unbounded allocation */
    buf = g_malloc(size + 1);
-    ret = bdrv_pread(file, desc_offset, buf, size);
+    ret = bdrv_pread(file, desc_offset, size, buf, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not read from file");
        g_free(buf);
        return NULL;
    }
-    buf[ret] = 0;
+    buf[size] = 0;
    return buf;
 }
@ -928,7 +922,7 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
    int64_t l1_backup_offset = 0;
    bool compressed;
-    ret = bdrv_pread(file, sizeof(magic), &header, sizeof(header));
+    ret = bdrv_pread(file, sizeof(magic), sizeof(header), &header, 0);
    if (ret < 0) {
        bdrv_refresh_filename(file->bs);
        error_setg_errno(errp, -ret,
@ -979,9 +973,8 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
            } QEMU_PACKED eos_marker;
        } QEMU_PACKED footer;
-        ret = bdrv_pread(file,
+        ret = bdrv_pread(file, bs->file->bs->total_sectors * 512 - 1536,
-            bs->file->bs->total_sectors * 512 - 1536,
+                         sizeof(footer), &footer, 0);
            &footer, sizeof(footer));
        if (ret < 0) {
            error_setg_errno(errp, -ret, "Failed to read footer");
            return ret;
@ -1448,16 +1441,16 @@ static int get_whole_cluster(BlockDriverState *bs,
        if (copy_from_backing) {
            /* qcow2 emits this on bs->file instead of bs->backing */
            BLKDBG_EVENT(extent->file, BLKDBG_COW_READ);
-            ret = bdrv_pread(bs->backing, offset, whole_grain,
+            ret = bdrv_pread(bs->backing, offset, skip_start_bytes,
-                             skip_start_bytes);
+                             whole_grain, 0);
            if (ret < 0) {
                ret = VMDK_ERROR;
                goto exit;
            }
        }
        BLKDBG_EVENT(extent->file, BLKDBG_COW_WRITE);
-        ret = bdrv_pwrite(extent->file, cluster_offset, whole_grain,
+        ret = bdrv_pwrite(extent->file, cluster_offset, skip_start_bytes,
-                          skip_start_bytes);
+                          whole_grain, 0);
        if (ret < 0) {
            ret = VMDK_ERROR;
            goto exit;
@ -1469,8 +1462,8 @@ static int get_whole_cluster(BlockDriverState *bs,
            /* qcow2 emits this on bs->file instead of bs->backing */
            BLKDBG_EVENT(extent->file, BLKDBG_COW_READ);
            ret = bdrv_pread(bs->backing, offset + skip_end_bytes,
-                             whole_grain + skip_end_bytes,
+                             cluster_bytes - skip_end_bytes,
-                             cluster_bytes - skip_end_bytes);
+                             whole_grain + skip_end_bytes, 0);
            if (ret < 0) {
                ret = VMDK_ERROR;
                goto exit;
@ -1478,8 +1471,8 @@ static int get_whole_cluster(BlockDriverState *bs,
        }
        BLKDBG_EVENT(extent->file, BLKDBG_COW_WRITE);
        ret = bdrv_pwrite(extent->file, cluster_offset + skip_end_bytes,
-                          whole_grain + skip_end_bytes,
+                          cluster_bytes - skip_end_bytes,
-                          cluster_bytes - skip_end_bytes);
+                          whole_grain + skip_end_bytes, 0);
        if (ret < 0) {
            ret = VMDK_ERROR;
            goto exit;
@ -1501,7 +1494,7 @@ static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
    if (bdrv_pwrite(extent->file,
                ((int64_t)m_data->l2_offset * 512)
                    + (m_data->l2_index * sizeof(offset)),
-                &offset, sizeof(offset)) < 0) {
+                sizeof(offset), &offset, 0) < 0) {
        return VMDK_ERROR;
    }
    /* update backup L2 table */
@ -1510,7 +1503,7 @@ static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
        if (bdrv_pwrite(extent->file,
                    ((int64_t)m_data->l2_offset * 512)
                        + (m_data->l2_index * sizeof(offset)),
-                    &offset, sizeof(offset)) < 0) {
+                    sizeof(offset), &offset, 0) < 0) {
            return VMDK_ERROR;
        }
    }
@ -1633,9 +1626,10 @@ static int get_cluster_offset(BlockDriverState *bs,
    BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
    if (bdrv_pread(extent->file,
                (int64_t)l2_offset * 512,
                l2_size_bytes,
                l2_table,
-                l2_size_bytes
+                0
-            ) != l2_size_bytes) {
+            ) < 0) {
        return VMDK_ERROR;
    }
@ -1903,9 +1897,7 @@ static int vmdk_read_extent(VmdkExtent *extent, int64_t cluster_offset,
    cluster_buf = g_malloc(buf_bytes);
    uncomp_buf = g_malloc(cluster_bytes);
    BLKDBG_EVENT(extent->file, BLKDBG_READ_COMPRESSED);
-    ret = bdrv_pread(extent->file,
+    ret = bdrv_pread(extent->file, cluster_offset, buf_bytes, cluster_buf, 0);
                cluster_offset,
                cluster_buf, buf_bytes);
    if (ret < 0) {
        goto out;
    }
@ -2244,12 +2236,12 @@ static int vmdk_init_extent(BlockBackend *blk,
    header.check_bytes[3] = 0xa;
    /* write all the data */
-    ret = blk_pwrite(blk, 0, &magic, sizeof(magic), 0);
+    ret = blk_pwrite(blk, 0, sizeof(magic), &magic, 0);
    if (ret < 0) {
        error_setg(errp, QERR_IO_ERROR);
        goto exit;
    }
-    ret = blk_pwrite(blk, sizeof(magic), &header, sizeof(header), 0);
+    ret = blk_pwrite(blk, sizeof(magic), sizeof(header), &header, 0);
    if (ret < 0) {
        error_setg(errp, QERR_IO_ERROR);
        goto exit;
@ -2269,7 +2261,7 @@ static int vmdk_init_extent(BlockBackend *blk,
        gd_buf[i] = cpu_to_le32(tmp);
    }
    ret = blk_pwrite(blk, le64_to_cpu(header.rgd_offset) * BDRV_SECTOR_SIZE,
-                     gd_buf, gd_buf_size, 0);
+                     gd_buf_size, gd_buf, 0);
    if (ret < 0) {
        error_setg(errp, QERR_IO_ERROR);
        goto exit;
@ -2281,7 +2273,7 @@ static int vmdk_init_extent(BlockBackend *blk,
        gd_buf[i] = cpu_to_le32(tmp);
    }
    ret = blk_pwrite(blk, le64_to_cpu(header.gd_offset) * BDRV_SECTOR_SIZE,
-                     gd_buf, gd_buf_size, 0);
+                     gd_buf_size, gd_buf, 0);
    if (ret < 0) {
        error_setg(errp, QERR_IO_ERROR);
    }
@ -2592,7 +2584,7 @@ static int coroutine_fn vmdk_co_do_create(int64_t size,
        desc_offset = 0x200;
    }
-    ret = blk_pwrite(blk, desc_offset, desc, desc_len, 0);
+    ret = blk_pwrite(blk, desc_offset, desc_len, desc, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Could not write description");
        goto exit;
--- a/block/vpc.c
+++ b/block/vpc.c
@ -252,7 +252,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
        goto fail;
    }
-    ret = bdrv_pread(bs->file, 0, &s->footer, sizeof(s->footer));
+    ret = bdrv_pread(bs->file, 0, sizeof(s->footer), &s->footer, 0);
    if (ret < 0) {
        error_setg(errp, "Unable to read VHD header");
        goto fail;
@ -272,8 +272,8 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
        }
        /* If a fixed disk, the footer is found only at the end of the file */
-        ret = bdrv_pread(bs->file, offset - sizeof(*footer),
+        ret = bdrv_pread(bs->file, offset - sizeof(*footer), sizeof(*footer),
-                         footer, sizeof(*footer));
+                         footer, 0);
        if (ret < 0) {
            goto fail;
        }
@ -347,7 +347,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
    if (disk_type == VHD_DYNAMIC) {
        ret = bdrv_pread(bs->file, be64_to_cpu(footer->data_offset),
-                         &dyndisk_header, sizeof(dyndisk_header));
+                         sizeof(dyndisk_header), &dyndisk_header, 0);
        if (ret < 0) {
            error_setg(errp, "Error reading dynamic VHD header");
            goto fail;
@ -401,8 +401,8 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
        s->bat_offset = be64_to_cpu(dyndisk_header.table_offset);
-        ret = bdrv_pread(bs->file, s->bat_offset, s->pagetable,
+        ret = bdrv_pread(bs->file, s->bat_offset, pagetable_size,
-                         pagetable_size);
+                         s->pagetable, 0);
        if (ret < 0) {
            error_setg(errp, "Error reading pagetable");
            goto fail;
@ -516,7 +516,8 @@ static inline int64_t get_image_offset(BlockDriverState *bs, uint64_t offset,
        s->last_bitmap_offset = bitmap_offset;
        memset(bitmap, 0xff, s->bitmap_size);
-        r = bdrv_pwrite_sync(bs->file, bitmap_offset, bitmap, s->bitmap_size);
+        r = bdrv_pwrite_sync(bs->file, bitmap_offset, s->bitmap_size, bitmap,
                             0);
        if (r < 0) {
            *err = r;
            return -2;
@ -538,7 +539,7 @@ static int rewrite_footer(BlockDriverState *bs)
    BDRVVPCState *s = bs->opaque;
    int64_t offset = s->free_data_block_offset;
-    ret = bdrv_pwrite_sync(bs->file, offset, &s->footer, sizeof(s->footer));
+    ret = bdrv_pwrite_sync(bs->file, offset, sizeof(s->footer), &s->footer, 0);
    if (ret < 0)
        return ret;
@ -572,8 +573,8 @@ static int64_t alloc_block(BlockDriverState *bs, int64_t offset)
    /* Initialize the block's bitmap */
    memset(bitmap, 0xff, s->bitmap_size);
-    ret = bdrv_pwrite_sync(bs->file, s->free_data_block_offset, bitmap,
+    ret = bdrv_pwrite_sync(bs->file, s->free_data_block_offset,
-        s->bitmap_size);
+                           s->bitmap_size, bitmap, 0);
    if (ret < 0) {
        return ret;
    }
@ -587,7 +588,7 @@ static int64_t alloc_block(BlockDriverState *bs, int64_t offset)
    /* Write BAT entry to disk */
    bat_offset = s->bat_offset + (4 * index);
    bat_value = cpu_to_be32(s->pagetable[index]);
-    ret = bdrv_pwrite_sync(bs->file, bat_offset, &bat_value, 4);
+    ret = bdrv_pwrite_sync(bs->file, bat_offset, 4, &bat_value, 0);
    if (ret < 0)
        goto fail;
@ -833,13 +834,13 @@ static int create_dynamic_disk(BlockBackend *blk, VHDFooter *footer,
    block_size = 0x200000;
    num_bat_entries = DIV_ROUND_UP(total_sectors, block_size / 512);
-    ret = blk_pwrite(blk, offset, footer, sizeof(*footer), 0);
+    ret = blk_pwrite(blk, offset, sizeof(*footer), footer, 0);
    if (ret < 0) {
        goto fail;
    }
    offset = 1536 + ((num_bat_entries * 4 + 511) & ~511);
-    ret = blk_pwrite(blk, offset, footer, sizeof(*footer), 0);
+    ret = blk_pwrite(blk, offset, sizeof(*footer), footer, 0);
    if (ret < 0) {
        goto fail;
    }
@ -849,7 +850,7 @@ static int create_dynamic_disk(BlockBackend *blk, VHDFooter *footer,
    memset(bat_sector, 0xFF, 512);
    for (i = 0; i < DIV_ROUND_UP(num_bat_entries * 4, 512); i++) {
-        ret = blk_pwrite(blk, offset, bat_sector, 512, 0);
+        ret = blk_pwrite(blk, offset, 512, bat_sector, 0);
        if (ret < 0) {
            goto fail;
        }
@ -877,7 +878,7 @@ static int create_dynamic_disk(BlockBackend *blk, VHDFooter *footer,
    /* Write the header */
    offset = 512;
-    ret = blk_pwrite(blk, offset, &dyndisk_header, sizeof(dyndisk_header), 0);
+    ret = blk_pwrite(blk, offset, sizeof(dyndisk_header), &dyndisk_header, 0);
    if (ret < 0) {
        goto fail;
    }
@ -900,8 +901,8 @@ static int create_fixed_disk(BlockBackend *blk, VHDFooter *footer,
        return ret;
    }
-    ret = blk_pwrite(blk, total_size - sizeof(*footer),
+    ret = blk_pwrite(blk, total_size - sizeof(*footer), sizeof(*footer),
-                     footer, sizeof(*footer), 0);
+                     footer, 0);
    if (ret < 0) {
        error_setg_errno(errp, -ret, "Unable to write VHD header");
        return ret;
--- a/block/vvfat.c
+++ b/block/vvfat.c
@ -1488,8 +1488,8 @@ static int vvfat_read(BlockDriverState *bs, int64_t sector_num,
                DLOG(fprintf(stderr, "sectors %" PRId64 "+%" PRId64
                             " allocated\n", sector_num,
                             n >> BDRV_SECTOR_BITS));
-                if (bdrv_pread(s->qcow, sector_num * BDRV_SECTOR_SIZE,
+                if (bdrv_pread(s->qcow, sector_num * BDRV_SECTOR_SIZE, n,
-                               buf + i * 0x200, n) < 0) {
+                               buf + i * 0x200, 0) < 0) {
                    return -1;
                }
                i += (n >> BDRV_SECTOR_BITS) - 1;
@ -1978,7 +1978,8 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
                            return -1;
                        }
                        res = bdrv_pwrite(s->qcow, offset * BDRV_SECTOR_SIZE,
-                                          s->cluster_buffer, BDRV_SECTOR_SIZE);
+                                          BDRV_SECTOR_SIZE, s->cluster_buffer,
                                          0);
                        if (res < 0) {
                            return -2;
                        }
@ -3062,8 +3063,8 @@ DLOG(checkpoint());
     * Use qcow backend. Commit later.
     */
 DLOG(fprintf(stderr, "Write to qcow backend: %d + %d\n", (int)sector_num, nb_sectors));
-    ret = bdrv_pwrite(s->qcow, sector_num * BDRV_SECTOR_SIZE, buf,
+    ret = bdrv_pwrite(s->qcow, sector_num * BDRV_SECTOR_SIZE,
-                      nb_sectors * BDRV_SECTOR_SIZE);
+                      nb_sectors * BDRV_SECTOR_SIZE, buf, 0);
    if (ret < 0) {
        fprintf(stderr, "Error writing to qcow backend\n");
        return ret;
--- a/bsd-user/bsd-file.h
+++ b/bsd-user/bsd-file.h
@ -22,11 +22,43 @@
 #include "qemu/path.h"
 #define LOCK_PATH(p, arg)                   \
 do {                                        \
    (p) = lock_user_string(arg);            \
    if ((p) == NULL) {                      \
        return -TARGET_EFAULT;              \
    }                                       \
 } while (0)
 #define UNLOCK_PATH(p, arg)     unlock_user(p, arg, 0)
 #define LOCK_PATH2(p1, arg1, p2, arg2)      \
 do {                                        \
    (p1) = lock_user_string(arg1);          \
    if ((p1) == NULL) {                     \
        return -TARGET_EFAULT;              \
    }                                       \
    (p2) = lock_user_string(arg2);          \
    if ((p2) == NULL) {                     \
        unlock_user(p1, arg1, 0);           \
        return -TARGET_EFAULT;              \
    }                                       \
 } while (0)
 #define UNLOCK_PATH2(p1, arg1, p2, arg2)    \
 do {                                        \
    unlock_user(p2, arg2, 0);               \
    unlock_user(p1, arg1, 0);               \
 } while (0)
 extern struct iovec *lock_iovec(int type, abi_ulong target_addr, int count,
        int copy);
 extern void unlock_iovec(struct iovec *vec, abi_ulong target_addr, int count,
        int copy);
 int safe_open(const char *path, int flags, mode_t mode);
 int safe_openat(int fd, const char *path, int flags, mode_t mode);
 ssize_t safe_read(int fd, void *buf, size_t nbytes);
 ssize_t safe_pread(int fd, void *buf, size_t nbytes, off_t offset);
 ssize_t safe_readv(int fd, const struct iovec *iov, int iovcnt);
@ -190,4 +222,721 @@ static abi_long do_bsd_pwritev(void *cpu_env, abi_long arg1,
    return ret;
 }
 /* open(2) */
 static abi_long do_bsd_open(abi_long arg1, abi_long arg2, abi_long arg3)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(safe_open(path(p), target_to_host_bitmask(arg2,
                fcntl_flags_tbl), arg3));
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* openat(2) */
 static abi_long do_bsd_openat(abi_long arg1, abi_long arg2,
        abi_long arg3, abi_long arg4)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
    ret = get_errno(safe_openat(arg1, path(p),
                target_to_host_bitmask(arg3, fcntl_flags_tbl), arg4));
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* close(2) */
 static abi_long do_bsd_close(abi_long arg1)
 {
    return get_errno(close(arg1));
 }
 /* fdatasync(2) */
 static abi_long do_bsd_fdatasync(abi_long arg1)
 {
    return get_errno(fdatasync(arg1));
 }
 /* fsync(2) */
 static abi_long do_bsd_fsync(abi_long arg1)
 {
    return get_errno(fsync(arg1));
 }
 /* closefrom(2) */
 static abi_long do_bsd_closefrom(abi_long arg1)
 {
    closefrom(arg1);  /* returns void */
    return get_errno(0);
 }
 /* revoke(2) */
 static abi_long do_bsd_revoke(abi_long arg1)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(revoke(p)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* access(2) */
 static abi_long do_bsd_access(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(access(path(p), arg2));
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* eaccess(2) */
 static abi_long do_bsd_eaccess(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(eaccess(path(p), arg2));
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* faccessat(2) */
 static abi_long do_bsd_faccessat(abi_long arg1, abi_long arg2,
        abi_long arg3, abi_long arg4)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
    ret = get_errno(faccessat(arg1, p, arg3, arg4)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* chdir(2) */
 static abi_long do_bsd_chdir(abi_long arg1)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(chdir(p)); /* XXX  path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* fchdir(2) */
 static abi_long do_bsd_fchdir(abi_long arg1)
 {
    return get_errno(fchdir(arg1));
 }
 /* rename(2) */
 static abi_long do_bsd_rename(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH2(p1, arg1, p2, arg2);
    ret = get_errno(rename(p1, p2)); /* XXX path(p1), path(p2) */
    UNLOCK_PATH2(p1, arg1, p2, arg2);
    return ret;
 }
 /* renameat(2) */
 static abi_long do_bsd_renameat(abi_long arg1, abi_long arg2,
        abi_long arg3, abi_long arg4)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH2(p1, arg2, p2, arg4);
    ret = get_errno(renameat(arg1, p1, arg3, p2));
    UNLOCK_PATH2(p1, arg2, p2, arg4);
    return ret;
 }
 /* link(2) */
 static abi_long do_bsd_link(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH2(p1, arg1, p2, arg2);
    ret = get_errno(link(p1, p2)); /* XXX path(p1), path(p2) */
    UNLOCK_PATH2(p1, arg1, p2, arg2);
    return ret;
 }
 /* linkat(2) */
 static abi_long do_bsd_linkat(abi_long arg1, abi_long arg2,
        abi_long arg3, abi_long arg4, abi_long arg5)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH2(p1, arg2, p2, arg4);
    ret = get_errno(linkat(arg1, p1, arg3, p2, arg5));
    UNLOCK_PATH2(p1, arg2, p2, arg4);
    return ret;
 }
 /* unlink(2) */
 static abi_long do_bsd_unlink(abi_long arg1)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(unlink(p)); /* XXX path(p) */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* unlinkat(2) */
 static abi_long do_bsd_unlinkat(abi_long arg1, abi_long arg2,
        abi_long arg3)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
    ret = get_errno(unlinkat(arg1, p, arg3)); /* XXX path(p) */
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* mkdir(2) */
 static abi_long do_bsd_mkdir(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(mkdir(p, arg2)); /* XXX path(p) */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* mkdirat(2) */
 static abi_long do_bsd_mkdirat(abi_long arg1, abi_long arg2,
        abi_long arg3)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
    ret = get_errno(mkdirat(arg1, p, arg3));
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* rmdir(2) */
 static abi_long do_bsd_rmdir(abi_long arg1)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(rmdir(p)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* undocumented __getcwd(char *buf, size_t len)  system call */
 static abi_long do_bsd___getcwd(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    p = lock_user(VERIFY_WRITE, arg1, arg2, 0);
    if (p == NULL) {
        return -TARGET_EFAULT;
    }
    ret = safe_syscall(SYS___getcwd, p, arg2);
    unlock_user(p, arg1, ret == 0 ? strlen(p) + 1 : 0);
    return get_errno(ret);
 }
 /* dup(2) */
 static abi_long do_bsd_dup(abi_long arg1)
 {
    return get_errno(dup(arg1));
 }
 /* dup2(2) */
 static abi_long do_bsd_dup2(abi_long arg1, abi_long arg2)
 {
    return get_errno(dup2(arg1, arg2));
 }
 /* truncate(2) */
 static abi_long do_bsd_truncate(void *cpu_env, abi_long arg1,
        abi_long arg2, abi_long arg3, abi_long arg4)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    if (regpairs_aligned(cpu_env) != 0) {
        arg2 = arg3;
        arg3 = arg4;
    }
    ret = get_errno(truncate(p, target_arg64(arg2, arg3)));
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* ftruncate(2) */
 static abi_long do_bsd_ftruncate(void *cpu_env, abi_long arg1,
        abi_long arg2, abi_long arg3, abi_long arg4)
 {
    if (regpairs_aligned(cpu_env) != 0) {
        arg2 = arg3;
        arg3 = arg4;
    }
    return get_errno(ftruncate(arg1, target_arg64(arg2, arg3)));
 }
 /* acct(2) */
 static abi_long do_bsd_acct(abi_long arg1)
 {
    abi_long ret;
    void *p;
    if (arg1 == 0) {
        ret = get_errno(acct(NULL));
    } else {
        LOCK_PATH(p, arg1);
        ret = get_errno(acct(path(p)));
        UNLOCK_PATH(p, arg1);
    }
    return ret;
 }
 /* sync(2) */
 static abi_long do_bsd_sync(void)
 {
    sync();
    return 0;
 }
 /* mount(2) */
 static abi_long do_bsd_mount(abi_long arg1, abi_long arg2, abi_long arg3,
        abi_long arg4)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH2(p1, arg1, p2, arg2);
    /*
     * XXX arg4 should be locked, but it isn't clear how to do that since it may
     * be not be a NULL-terminated string.
     */
    if (arg4 == 0) {
        ret = get_errno(mount(p1, p2, arg3, NULL)); /* XXX path(p2)? */
    } else {
        ret = get_errno(mount(p1, p2, arg3, g2h_untagged(arg4))); /* XXX path(p2)? */
    }
    UNLOCK_PATH2(p1, arg1, p2, arg2);
    return ret;
 }
 /* unmount(2) */
 static abi_long do_bsd_unmount(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(unmount(p, arg2)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* nmount(2) */
 static abi_long do_bsd_nmount(abi_long arg1, abi_long count,
        abi_long flags)
 {
    abi_long ret;
    struct iovec *vec = lock_iovec(VERIFY_READ, arg1, count, 1);
    if (vec != NULL) {
        ret = get_errno(nmount(vec, count, flags));
        unlock_iovec(vec, arg1, count, 0);
    } else {
        return -TARGET_EFAULT;
    }
    return ret;
 }
 /* symlink(2) */
 static abi_long do_bsd_symlink(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH2(p1, arg1, p2, arg2);
    ret = get_errno(symlink(p1, p2)); /* XXX path(p1), path(p2) */
    UNLOCK_PATH2(p1, arg1, p2, arg2);
    return ret;
 }
 /* symlinkat(2) */
 static abi_long do_bsd_symlinkat(abi_long arg1, abi_long arg2,
        abi_long arg3)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH2(p1, arg1, p2, arg3);
    ret = get_errno(symlinkat(p1, arg2, p2)); /* XXX path(p1), path(p2) */
    UNLOCK_PATH2(p1, arg1, p2, arg3);
    return ret;
 }
 /* readlink(2) */
 static abi_long do_bsd_readlink(CPUArchState *env, abi_long arg1,
        abi_long arg2, abi_long arg3)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH(p1, arg1);
    p2 = lock_user(VERIFY_WRITE, arg2, arg3, 0);
    if (p2 == NULL) {
        UNLOCK_PATH(p1, arg1);
        return -TARGET_EFAULT;
    }
    if (strcmp(p1, "/proc/curproc/file") == 0) {
        CPUState *cpu = env_cpu(env);
        TaskState *ts = (TaskState *)cpu->opaque;
        strncpy(p2, ts->bprm->fullpath, arg3);
        ret = MIN((abi_long)strlen(ts->bprm->fullpath), arg3);
    } else {
        ret = get_errno(readlink(path(p1), p2, arg3));
    }
    unlock_user(p2, arg2, ret);
    UNLOCK_PATH(p1, arg1);
    return ret;
 }
 /* readlinkat(2) */
 static abi_long do_bsd_readlinkat(abi_long arg1, abi_long arg2,
        abi_long arg3, abi_long arg4)
 {
    abi_long ret;
    void *p1, *p2;
    LOCK_PATH(p1, arg2);
    p2 = lock_user(VERIFY_WRITE, arg3, arg4, 0);
    if (p2 == NULL) {
        UNLOCK_PATH(p1, arg2);
        return -TARGET_EFAULT;
    }
    ret = get_errno(readlinkat(arg1, p1, p2, arg4));
    unlock_user(p2, arg3, ret);
    UNLOCK_PATH(p1, arg2);
    return ret;
 }
 /* chmod(2) */
 static abi_long do_bsd_chmod(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(chmod(p, arg2)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* fchmod(2) */
 static abi_long do_bsd_fchmod(abi_long arg1, abi_long arg2)
 {
    return get_errno(fchmod(arg1, arg2));
 }
 /* lchmod(2) */
 static abi_long do_bsd_lchmod(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(lchmod(p, arg2)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* fchmodat(2) */
 static abi_long do_bsd_fchmodat(abi_long arg1, abi_long arg2,
        abi_long arg3, abi_long arg4)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
    ret = get_errno(fchmodat(arg1, p, arg3, arg4));
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* pre-ino64 mknod(2) */
 static abi_long do_bsd_freebsd11_mknod(abi_long arg1, abi_long arg2, abi_long arg3)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(syscall(SYS_freebsd11_mknod, p, arg2, arg3));
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* pre-ino64 mknodat(2) */
 static abi_long do_bsd_freebsd11_mknodat(abi_long arg1, abi_long arg2,
        abi_long arg3, abi_long arg4)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
    ret = get_errno(syscall(SYS_freebsd11_mknodat, arg1, p, arg3, arg4));
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* post-ino64 mknodat(2) */
 static abi_long do_bsd_mknodat(void *cpu_env, abi_long arg1,
        abi_long arg2, abi_long arg3, abi_long arg4, abi_long arg5,
        abi_long arg6)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
       /* 32-bit arch's use two 32 registers for 64 bit return value */
    if (regpairs_aligned(cpu_env) != 0) {
        ret = get_errno(mknodat(arg1, p, arg3, target_arg64(arg5, arg6)));
    } else {
        ret = get_errno(mknodat(arg1, p, arg3, target_arg64(arg4, arg5)));
    }
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* chown(2) */
 static abi_long do_bsd_chown(abi_long arg1, abi_long arg2, abi_long arg3)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(chown(p, arg2, arg3)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* fchown(2) */
 static abi_long do_bsd_fchown(abi_long arg1, abi_long arg2,
        abi_long arg3)
 {
    return get_errno(fchown(arg1, arg2, arg3));
 }
 /* lchown(2) */
 static abi_long do_bsd_lchown(abi_long arg1, abi_long arg2,
        abi_long arg3)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(lchown(p, arg2, arg3)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* fchownat(2) */
 static abi_long do_bsd_fchownat(abi_long arg1, abi_long arg2,
        abi_long arg3, abi_long arg4, abi_long arg5)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
    ret = get_errno(fchownat(arg1, p, arg3, arg4, arg5)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* chflags(2) */
 static abi_long do_bsd_chflags(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(chflags(p, arg2)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* lchflags(2) */
 static abi_long do_bsd_lchflags(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(lchflags(p, arg2)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* fchflags(2) */
 static abi_long do_bsd_fchflags(abi_long arg1, abi_long arg2)
 {
    return get_errno(fchflags(arg1, arg2));
 }
 /* chroot(2) */
 static abi_long do_bsd_chroot(abi_long arg1)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(chroot(p)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* flock(2) */
 static abi_long do_bsd_flock(abi_long arg1, abi_long arg2)
 {
    return get_errno(flock(arg1, arg2));
 }
 /* mkfifo(2) */
 static abi_long do_bsd_mkfifo(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(mkfifo(p, arg2)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* mkfifoat(2) */
 static abi_long do_bsd_mkfifoat(abi_long arg1, abi_long arg2,
        abi_long arg3)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg2);
    ret = get_errno(mkfifoat(arg1, p, arg3));
    UNLOCK_PATH(p, arg2);
    return ret;
 }
 /* pathconf(2) */
 static abi_long do_bsd_pathconf(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(pathconf(p, arg2)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* lpathconf(2) */
 static abi_long do_bsd_lpathconf(abi_long arg1, abi_long arg2)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(lpathconf(p, arg2)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 /* fpathconf(2) */
 static abi_long do_bsd_fpathconf(abi_long arg1, abi_long arg2)
 {
    return get_errno(fpathconf(arg1, arg2));
 }
 /* undelete(2) */
 static abi_long do_bsd_undelete(abi_long arg1)
 {
    abi_long ret;
    void *p;
    LOCK_PATH(p, arg1);
    ret = get_errno(undelete(p)); /* XXX path(p)? */
    UNLOCK_PATH(p, arg1);
    return ret;
 }
 #endif /* BSD_FILE_H */
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@ -32,7 +32,9 @@
 #include "qemu/cutils.h"
 #include "qemu/path.h"
 #include <sys/syscall.h>
 #include <sys/cdefs.h>
 #include <sys/param.h>
 #include <sys/mount.h>
 #include <sys/sysctl.h>
 #include <utime.h>
@ -44,6 +46,10 @@
 #include "bsd-proc.h"
 /* I/O */
 safe_syscall3(int, open, const char *, path, int, flags, mode_t, mode);
 safe_syscall4(int, openat, int, fd, const char *, path, int, flags, mode_t,
    mode);
 safe_syscall3(ssize_t, read, int, fd, void *, buf, size_t, nbytes);
 safe_syscall4(ssize_t, pread, int, fd, void *, buf, size_t, nbytes, off_t,
    offset);
@ -257,6 +263,234 @@ static abi_long freebsd_syscall(void *cpu_env, int num, abi_long arg1,
        ret = do_bsd_pwritev(cpu_env, arg1, arg2, arg3, arg4, arg5, arg6);
        break;
    case TARGET_FREEBSD_NR_open: /* open(2) */
        ret = do_bsd_open(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_openat: /* openat(2) */
        ret = do_bsd_openat(arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_close: /* close(2) */
        ret = do_bsd_close(arg1);
        break;
    case TARGET_FREEBSD_NR_fdatasync: /* fdatasync(2) */
        ret = do_bsd_fdatasync(arg1);
        break;
    case TARGET_FREEBSD_NR_fsync: /* fsync(2) */
        ret = do_bsd_fsync(arg1);
        break;
    case TARGET_FREEBSD_NR_freebsd12_closefrom: /* closefrom(2) */
        ret = do_bsd_closefrom(arg1);
        break;
    case TARGET_FREEBSD_NR_revoke: /* revoke(2) */
        ret = do_bsd_revoke(arg1);
        break;
    case TARGET_FREEBSD_NR_access: /* access(2) */
        ret = do_bsd_access(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_eaccess: /* eaccess(2) */
        ret = do_bsd_eaccess(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_faccessat: /* faccessat(2) */
        ret = do_bsd_faccessat(arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_chdir: /* chdir(2) */
        ret = do_bsd_chdir(arg1);
        break;
    case TARGET_FREEBSD_NR_fchdir: /* fchdir(2) */
        ret = do_bsd_fchdir(arg1);
        break;
    case TARGET_FREEBSD_NR_rename: /* rename(2) */
        ret = do_bsd_rename(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_renameat: /* renameat(2) */
        ret = do_bsd_renameat(arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_link: /* link(2) */
        ret = do_bsd_link(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_linkat: /* linkat(2) */
        ret = do_bsd_linkat(arg1, arg2, arg3, arg4, arg5);
        break;
    case TARGET_FREEBSD_NR_unlink: /* unlink(2) */
        ret = do_bsd_unlink(arg1);
        break;
    case TARGET_FREEBSD_NR_unlinkat: /* unlinkat(2) */
        ret = do_bsd_unlinkat(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_mkdir: /* mkdir(2) */
        ret = do_bsd_mkdir(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_mkdirat: /* mkdirat(2) */
        ret = do_bsd_mkdirat(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_rmdir: /* rmdir(2) (XXX no rmdirat()?) */
        ret = do_bsd_rmdir(arg1);
        break;
    case TARGET_FREEBSD_NR___getcwd: /* undocumented __getcwd() */
        ret = do_bsd___getcwd(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_dup: /* dup(2) */
        ret = do_bsd_dup(arg1);
        break;
    case TARGET_FREEBSD_NR_dup2: /* dup2(2) */
        ret = do_bsd_dup2(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_truncate: /* truncate(2) */
        ret = do_bsd_truncate(cpu_env, arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_ftruncate: /* ftruncate(2) */
        ret = do_bsd_ftruncate(cpu_env, arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_acct: /* acct(2) */
        ret = do_bsd_acct(arg1);
        break;
    case TARGET_FREEBSD_NR_sync: /* sync(2) */
        ret = do_bsd_sync();
        break;
    case TARGET_FREEBSD_NR_mount: /* mount(2) */
        ret = do_bsd_mount(arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_unmount: /* unmount(2) */
        ret = do_bsd_unmount(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_nmount: /* nmount(2) */
        ret = do_bsd_nmount(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_symlink: /* symlink(2) */
        ret = do_bsd_symlink(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_symlinkat: /* symlinkat(2) */
        ret = do_bsd_symlinkat(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_readlink: /* readlink(2) */
        ret = do_bsd_readlink(cpu_env, arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_readlinkat: /* readlinkat(2) */
        ret = do_bsd_readlinkat(arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_chmod: /* chmod(2) */
        ret = do_bsd_chmod(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_fchmod: /* fchmod(2) */
        ret = do_bsd_fchmod(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_lchmod: /* lchmod(2) */
        ret = do_bsd_lchmod(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_fchmodat: /* fchmodat(2) */
        ret = do_bsd_fchmodat(arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_freebsd11_mknod: /* mknod(2) */
        ret = do_bsd_freebsd11_mknod(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_freebsd11_mknodat: /* mknodat(2) */
        ret = do_bsd_freebsd11_mknodat(arg1, arg2, arg3, arg4);
        break;
    case TARGET_FREEBSD_NR_mknodat: /* mknodat(2) */
        ret = do_bsd_mknodat(cpu_env, arg1, arg2, arg3, arg4, arg5, arg6);
        break;
    case TARGET_FREEBSD_NR_chown: /* chown(2) */
        ret = do_bsd_chown(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_fchown: /* fchown(2) */
        ret = do_bsd_fchown(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_lchown: /* lchown(2) */
        ret = do_bsd_lchown(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_fchownat: /* fchownat(2) */
        ret = do_bsd_fchownat(arg1, arg2, arg3, arg4, arg5);
        break;
    case TARGET_FREEBSD_NR_chflags: /* chflags(2) */
        ret = do_bsd_chflags(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_lchflags: /* lchflags(2) */
        ret = do_bsd_lchflags(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_fchflags: /* fchflags(2) */
        ret = do_bsd_fchflags(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_chroot: /* chroot(2) */
        ret = do_bsd_chroot(arg1);
        break;
    case TARGET_FREEBSD_NR_flock: /* flock(2) */
        ret = do_bsd_flock(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_mkfifo: /* mkfifo(2) */
        ret = do_bsd_mkfifo(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_mkfifoat: /* mkfifoat(2) */
        ret = do_bsd_mkfifoat(arg1, arg2, arg3);
        break;
    case TARGET_FREEBSD_NR_pathconf: /* pathconf(2) */
        ret = do_bsd_pathconf(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_lpathconf: /* lpathconf(2) */
        ret = do_bsd_lpathconf(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_fpathconf: /* fpathconf(2) */
        ret = do_bsd_fpathconf(arg1, arg2);
        break;
    case TARGET_FREEBSD_NR_undelete: /* undelete(2) */
        ret = do_bsd_undelete(arg1);
        break;
    default:
        qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
        ret = -TARGET_ENOSYS;
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@ -226,4 +226,8 @@ type safe_##name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, \
    return safe_syscall(SYS_##name, arg1, arg2, arg3, arg4, arg5, arg6); \
 }
 /* So far all target and host bitmasks are the same */
 #define target_to_host_bitmask(x, tbl) (x)
 #define host_to_target_bitmask(x, tbl) (x)
 #endif /* SYSCALL_DEFS_H */
--- a/common-user/meson.build
+++ b/common-user/meson.build
@ -1,3 +1,7 @@
 if not have_user
   subdir_done()
 endif
 common_user_inc += include_directories('host/' / host_arch)
 user_ss.add(files(
--- a/configs/targets/aarch64-linux-user.mak
+++ b/configs/targets/aarch64-linux-user.mak
@ -2,4 +2,5 @@ TARGET_ARCH=aarch64
 TARGET_BASE_ARCH=arm
 TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml
 TARGET_HAS_BFLT=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
--- a/configs/targets/aarch64_be-linux-user.mak
+++ b/configs/targets/aarch64_be-linux-user.mak
@ -3,4 +3,5 @@ TARGET_BASE_ARCH=arm
 TARGET_BIG_ENDIAN=y
 TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml
 TARGET_HAS_BFLT=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
--- a/configs/targets/arm-linux-user.mak
+++ b/configs/targets/arm-linux-user.mak
@ -3,4 +3,5 @@ TARGET_SYSTBL_ABI=common,oabi
 TARGET_SYSTBL=syscall.tbl
 TARGET_XML_FILES= gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml gdb-xml/arm-vfp3.xml gdb-xml/arm-vfp-sysregs.xml gdb-xml/arm-neon.xml gdb-xml/arm-m-profile.xml gdb-xml/arm-m-profile-mve.xml
 TARGET_HAS_BFLT=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
--- a/configs/targets/armeb-linux-user.mak
+++ b/configs/targets/armeb-linux-user.mak
@ -4,4 +4,5 @@ TARGET_SYSTBL=syscall.tbl
 TARGET_BIG_ENDIAN=y
 TARGET_XML_FILES= gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml gdb-xml/arm-vfp3.xml gdb-xml/arm-vfp-sysregs.xml gdb-xml/arm-neon.xml gdb-xml/arm-m-profile.xml gdb-xml/arm-m-profile-mve.xml
 TARGET_HAS_BFLT=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
--- a/configs/targets/loongarch64-linux-user.mak
+++ b/configs/targets/loongarch64-linux-user.mak
@ -0,0 +1,3 @@
 # Default configuration for loongarch64-linux-user
 TARGET_ARCH=loongarch64
 TARGET_BASE_ARCH=loongarch
--- a/configs/targets/loongarch64-softmmu.mak
+++ b/configs/targets/loongarch64-softmmu.mak
@ -2,3 +2,4 @@ TARGET_ARCH=loongarch64
 TARGET_BASE_ARCH=loongarch
 TARGET_SUPPORTS_MTTCG=y
 TARGET_XML_FILES= gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu64.xml
 TARGET_NEED_FDT=y
--- a/configs/targets/riscv32-linux-user.mak
+++ b/configs/targets/riscv32-linux-user.mak
@ -2,4 +2,5 @@ TARGET_ARCH=riscv32
 TARGET_BASE_ARCH=riscv
 TARGET_ABI_DIR=riscv
 TARGET_XML_FILES= gdb-xml/riscv-32bit-cpu.xml gdb-xml/riscv-32bit-fpu.xml gdb-xml/riscv-64bit-fpu.xml gdb-xml/riscv-32bit-virtual.xml
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
--- a/configs/targets/riscv64-linux-user.mak
+++ b/configs/targets/riscv64-linux-user.mak
@ -2,4 +2,5 @@ TARGET_ARCH=riscv64
 TARGET_BASE_ARCH=riscv
 TARGET_ABI_DIR=riscv
 TARGET_XML_FILES= gdb-xml/riscv-64bit-cpu.xml gdb-xml/riscv-32bit-fpu.xml gdb-xml/riscv-64bit-fpu.xml gdb-xml/riscv-64bit-virtual.xml
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
--- a/263
+++ b/263
@ -329,7 +329,7 @@ fi
 fdt="auto"
 # 2. Automatically enable/disable other options
-tcg="enabled"
+tcg="auto"
 cfi="false"
 # parse CC options second
@ -678,6 +678,21 @@ werror=""
 as_shared_lib="no"
 as_static_lib="no"
 meson_option_build_array() {
  printf '['
  (if test "$targetos" == windows; then
    IFS=\;
  else
    IFS=:
  fi
  for e in $1; do
    e=${e/'\'/'\\'}
    e=${e/\"/'\"'}
    printf '"""%s""",' "$e"
  done)
  printf ']\n'
 }
 . $source_path/scripts/meson-buildoptions.sh
 meson_options=
@ -1368,13 +1383,6 @@ static THREAD int tls_var;
 int main(void) { return tls_var; }
 EOF
 # Check we support -fno-pie and -no-pie first; we will need the former for
 # building ROMs, and both for everything if --disable-pie is passed.
 if compile_prog "-Werror -fno-pie" "-no-pie"; then
  CFLAGS_NOPIE="-fno-pie"
  LDFLAGS_NOPIE="-no-pie"
 fi
 if test "$static" = "yes"; then
  if test "$pie" != "no" && compile_prog "-Werror -fPIE -DPIE" "-static-pie"; then
    CONFIGURE_CFLAGS="-fPIE -DPIE $CONFIGURE_CFLAGS"
@ -1387,8 +1395,10 @@ if test "$static" = "yes"; then
    pie="no"
  fi
 elif test "$pie" = "no"; then
-  CONFIGURE_CFLAGS="$CFLAGS_NOPIE $CONFIGURE_CFLAGS"
+  if compile_prog "-Werror -fno-pie" "-no-pie"; then
-  CONFIGURE_LDFLAGS="$LDFLAGS_NOPIE $CONFIGURE_LDFLAGS"
+    CONFIGURE_CFLAGS="-fno-pie $CONFIGURE_CFLAGS"
    CONFIGURE_LDFLAGS="-no-pie $CONFIGURE_LDFLAGS"
  fi
 elif compile_prog "-Werror -fPIE -DPIE" "-pie"; then
  CONFIGURE_CFLAGS="-fPIE -DPIE $CONFIGURE_CFLAGS"
  CONFIGURE_LDFLAGS="-pie $CONFIGURE_LDFLAGS"
@ -1431,11 +1441,6 @@ EOF
  fi
 fi
 if test "$tcg" = "enabled"; then
    git_submodules="$git_submodules tests/fp/berkeley-testfloat-3"
    git_submodules="$git_submodules tests/fp/berkeley-softfloat-3"
 fi
 if test -z "${target_list+xxx}" ; then
    default_targets=yes
    for target in $default_target_list; do
@ -1466,6 +1471,19 @@ case " $target_list " in
  ;;
 esac
 if test "$tcg" = "auto"; then
  if test -z "$target_list"; then
    tcg="disabled"
  else
    tcg="enabled"
  fi
 fi
 if test "$tcg" = "enabled"; then
    git_submodules="$git_submodules tests/fp/berkeley-testfloat-3"
    git_submodules="$git_submodules tests/fp/berkeley-softfloat-3"
 fi
 feature_not_found() {
  feature=$1
  remedy=$2
@ -1880,7 +1898,7 @@ fi
 : ${cross_cc_hexagon="hexagon-unknown-linux-musl-clang"}
 : ${cross_cc_cflags_hexagon="-mv67 -O2 -static"}
 : ${cross_cc_cflags_i386="-m32"}
-: ${cross_cc_cflags_ppc="-m32"}
+: ${cross_cc_cflags_ppc="-m32 -mbig-endian"}
 : ${cross_cc_cflags_ppc64="-m64 -mbig-endian"}
 : ${cross_cc_ppc64le="$cross_cc_ppc64"}
 : ${cross_cc_cflags_ppc64le="-m64 -mlittle-endian"}
@ -1890,6 +1908,7 @@ fi
 : ${cross_cc_cflags_x86_64="-m64"}
 compute_target_variable() {
  eval "$2="
  if eval test -n "\"\${cross_prefix_$1}\""; then
    if eval has "\"\${cross_prefix_$1}\$3\""; then
      eval "$2=\"\${cross_prefix_$1}\$3\""
@ -1897,8 +1916,20 @@ compute_target_variable() {
  fi
 }
 # probe_target_compiler TARGET
 #
 # Look for a compiler for the given target, either native or cross.
 # Set variables target_* if a compiler is found, and container_cross_*
 # if a Docker-based cross-compiler image is known for the target.
 # Set got_cross_cc to yes/no depending on whether a non-container-based
 # compiler was found.
 #
 # If TARGET is a user-mode emulation target, also set build_static to
 # "y" if static linking is possible.
 #
 probe_target_compiler() {
  # reset all output variables
  got_cross_cc=no
  container_image=
  container_hosts=
  container_cross_cc=
@ -1909,16 +1940,9 @@ probe_target_compiler() {
  container_cross_objcopy=
  container_cross_ranlib=
  container_cross_strip=
  target_cc=
  target_ar=
  target_as=
  target_ld=
  target_nm=
  target_objcopy=
  target_ranlib=
  target_strip=
-  case $1 in
+  target_arch=${1%%-*}
  case $target_arch in
    aarch64) container_hosts="x86_64 aarch64" ;;
    alpha) container_hosts=x86_64 ;;
    arm) container_hosts="x86_64 aarch64" ;;
@ -1926,6 +1950,7 @@ probe_target_compiler() {
    hexagon) container_hosts=x86_64 ;;
    hppa) container_hosts=x86_64 ;;
    i386) container_hosts=x86_64 ;;
    loongarch64) container_hosts=x86_64 ;;
    m68k) container_hosts=x86_64 ;;
    microblaze) container_hosts=x86_64 ;;
    mips64el) container_hosts=x86_64 ;;
@ -1947,7 +1972,7 @@ probe_target_compiler() {
  for host in $container_hosts; do
    test "$container" != no || continue
    test "$host" = "$cpu" || continue
-    case $1 in
+    case $target_arch in
      aarch64)
        # We don't have any bigendian build tools so we only use this for AArch64
        container_image=debian-arm64-cross
@ -1980,6 +2005,10 @@ probe_target_compiler() {
        container_image=fedora-i386-cross
        container_cross_prefix=
        ;;
      loongarch64)
        container_image=debian-loongarch-cross
        container_cross_prefix=loongarch64-unknown-linux-gnu-
        ;;
      m68k)
        container_image=debian-m68k-cross
        container_cross_prefix=m68k-linux-gnu-
@ -2063,54 +2092,116 @@ probe_target_compiler() {
    : ${container_cross_strip:=${container_cross_prefix}strip}
  done
-  eval "target_cflags=\${cross_cc_cflags_$1}"
+  local t try
-  if eval test -n "\"\${cross_cc_$1}\""; then
+  try=cross
-    if eval has "\"\${cross_cc_$1}\""; then
+  case "$target_arch:$cpu" in
-      eval "target_cc=\"\${cross_cc_$1}\""
+    aarch64_be:aarch64 | \
    armeb:arm | \
    i386:x86_64 | \
    mips*:mips64 | \
    ppc*:ppc64 | \
    sparc:sparc64 | \
    "$cpu:$cpu")
      try='native cross' ;;
  esac
  eval "target_cflags=\${cross_cc_cflags_$target_arch}"
  for t in $try; do
    case $t in
    native)
      target_cc=$cc
      target_ccas=$ccas
      target_ar=$ar
      target_as=$as
      target_ld=$ld
      target_nm=$nm
      target_objcopy=$objcopy
      target_ranlib=$ranlib
      target_strip=$strip
      ;;
    cross)
      target_cc=
      if eval test -n "\"\${cross_cc_$target_arch}\""; then
        if eval has "\"\${cross_cc_$target_arch}\""; then
          eval "target_cc=\"\${cross_cc_$target_arch}\""
        fi
      else
        compute_target_variable $target_arch target_cc gcc
      fi
      target_ccas=$target_cc
      compute_target_variable $target_arch target_ar ar
      compute_target_variable $target_arch target_as as
      compute_target_variable $target_arch target_ld ld
      compute_target_variable $target_arch target_nm nm
      compute_target_variable $target_arch target_objcopy objcopy
      compute_target_variable $target_arch target_ranlib ranlib
      compute_target_variable $target_arch target_strip strip
      ;;
    esac
    if test -n "$target_cc"; then
      case $target_arch in
        i386|x86_64)
          if $target_cc --version | grep -qi "clang"; then
            continue
          fi
          ;;
      esac
    elif test -n "$target_as" && test -n "$target_ld"; then
      # Special handling for assembler only targets
      case $target in
        tricore-softmmu)
          build_static=
          got_cross_cc=yes
          break
          ;;
        *)
          continue
          ;;
      esac
    else
      continue
    fi
-  else
+
-    compute_target_variable $1 target_cc gcc
+    write_c_skeleton
  fi
  target_ccas=$target_cc
  compute_target_variable $1 target_ar ar
  compute_target_variable $1 target_as as
  compute_target_variable $1 target_ld ld
  compute_target_variable $1 target_nm nm
  compute_target_variable $1 target_objcopy objcopy
  compute_target_variable $1 target_ranlib ranlib
  compute_target_variable $1 target_strip strip
  if test "$1" = $cpu; then
    : ${target_cc:=$cc}
    : ${target_ccas:=$ccas}
    : ${target_as:=$as}
    : ${target_ld:=$ld}
    : ${target_ar:=$ar}
    : ${target_as:=$as}
    : ${target_ld:=$ld}
    : ${target_nm:=$nm}
    : ${target_objcopy:=$objcopy}
    : ${target_ranlib:=$ranlib}
    : ${target_strip:=$strip}
  fi
  if test -n "$target_cc"; then
    case $1 in
-      i386|x86_64)
+      *-softmmu)
-        if $target_cc --version | grep -qi "clang"; then
+        if do_compiler "$target_cc" $target_cflags -o $TMPO -c $TMPC &&
-          unset target_cc
+          do_compiler "$target_cc" $target_cflags -r -nostdlib -o "${TMPDIR1}/${TMPB}2.o" "$TMPO" -lgcc; then
          got_cross_cc=yes
          break
        fi
        ;;
      *)
        if do_compiler "$target_cc" $target_cflags -o $TMPE $TMPC -static ; then
          build_static=y
          got_cross_cc=yes
          break
        fi
        if do_compiler "$target_cc" $target_cflags -o $TMPE $TMPC ; then
          build_static=
          got_cross_cc=yes
          break
        fi
        ;;
    esac
  done
  if test $got_cross_cc != yes; then
    build_static=
    target_cc=
    target_ccas=
    target_cflags=
    target_ar=
    target_as=
    target_ld=
    target_nm=
    target_objcopy=
    target_ranlib=
    target_strip=
  fi
 }
 probe_target_compilers() {
  for i; do
    probe_target_compiler $i
    test -n "$target_cc" && return 0
  done
 }
 write_target_makefile() {
  echo "EXTRA_CFLAGS=$target_cflags"
  if test -n "$target_cc"; then
    echo "CC=$target_cc"
    echo "CCAS=$target_ccas"
@ -2139,6 +2230,7 @@ write_target_makefile() {
 }
 write_container_target_makefile() {
  echo "EXTRA_CFLAGS=$target_cflags"
  if test -n "$container_cross_cc"; then
    echo "CC=\$(DOCKER_SCRIPT) cc --cc $container_cross_cc -i qemu/$container_image -s $source_path --"
    echo "CCAS=\$(DOCKER_SCRIPT) cc --cc $container_cross_cc -i qemu/$container_image -s $source_path --"
@ -2260,7 +2352,7 @@ done
 # Mac OS X ships with a broken assembler
 roms=
-probe_target_compilers i386 x86_64
+probe_target_compiler i386-softmmu
 if test -n "$target_cc" &&
        test "$targetos" != "darwin" && test "$targetos" != "sunos" && \
        test "$targetos" != "haiku" && test "$softmmu" = yes ; then
@ -2274,7 +2366,7 @@ if test -n "$target_cc" &&
        fi
    done
    if test -n "$ld_i386_emulation"; then
-        roms="optionrom"
+        roms="pc-bios/optionrom"
        config_mak=pc-bios/optionrom/config.mak
        echo "# Automatically generated by configure - do not modify" > $config_mak
        echo "TOPSRC_DIR=$source_path" >> $config_mak
@ -2283,9 +2375,9 @@ if test -n "$target_cc" &&
    fi
 fi
-probe_target_compilers ppc ppc64
+probe_target_compiler ppc-softmmu
 if test -n "$target_cc" && test "$softmmu" = yes; then
-    roms="$roms vof"
+    roms="$roms pc-bios/vof"
    config_mak=pc-bios/vof/config.mak
    echo "# Automatically generated by configure - do not modify" > $config_mak
    echo "SRC_DIR=$source_path/pc-bios/vof" >> $config_mak
@ -2294,7 +2386,7 @@ fi
 # Only build s390-ccw bios if the compiler has -march=z900 or -march=z10
 # (which is the lowest architecture level that Clang supports)
-probe_target_compiler s390x
+probe_target_compiler s390x-softmmu
 if test -n "$target_cc" && test "$softmmu" = yes; then
  write_c_skeleton
  do_compiler "$target_cc" $target_cc_cflags -march=z900 -o $TMPO -c $TMPC
@ -2304,7 +2396,7 @@ if test -n "$target_cc" && test "$softmmu" = yes; then
      echo "WARNING: Your compiler does not support the z900!"
      echo "         The s390-ccw bios will only work with guest CPUs >= z10."
    fi
-    roms="$roms s390-ccw"
+    roms="$roms pc-bios/s390-ccw"
    config_mak=pc-bios/s390-ccw/config-host.mak
    echo "# Automatically generated by configure - do not modify" > $config_mak
    echo "SRC_PATH=$source_path/pc-bios/s390-ccw" >> $config_mak
@ -2514,7 +2606,6 @@ tcg_tests_targets=
 for target in $target_list; do
  arch=${target%%-*}
  probe_target_compiler ${arch}
  config_target_mak=tests/tcg/config-$target.mak
  echo "# Automatically generated by configure - do not modify" > $config_target_mak
@ -2533,29 +2624,7 @@ for target in $target_list; do
      ;;
  esac
-  got_cross_cc=no
+  probe_target_compiler $target
  unset build_static
  if test -n "$target_cc"; then
      write_c_skeleton
      if ! do_compiler "$target_cc" $target_cflags \
           -o $TMPE $TMPC -static ; then
          # For host systems we might get away with building without -static
          if do_compiler "$target_cc" $target_cflags \
                         -o $TMPE $TMPC ; then
              got_cross_cc=yes
          fi
      else
          got_cross_cc=yes
          build_static=y
      fi
  elif test -n "$target_as" && test -n "$target_ld"; then
      # Special handling for assembler only tests
      case $target in
          tricore-softmmu) got_cross_cc=yes ;;
      esac
  fi
  if test $got_cross_cc = yes; then
      # Test for compiler features for optional tests. We only do this
      # for cross compilers because ensuring the docker containers based
@ -2629,7 +2698,6 @@ for target in $target_list; do
  if test $got_cross_cc = yes; then
      mkdir -p tests/tcg/$target
      echo "QEMU=$PWD/$qemu" >> $config_target_mak
      echo "EXTRA_CFLAGS=$target_cflags" >> $config_target_mak
      echo "run-tcg-tests-$target: $qemu\$(EXESUF)" >> $makefile
      tcg_tests_targets="$tcg_tests_targets $target"
  fi
@ -2763,13 +2831,12 @@ preserve_env CC
 preserve_env CFLAGS
 preserve_env CXX
 preserve_env CXXFLAGS
 preserve_env INSTALL
 preserve_env LD
 preserve_env LDFLAGS
 preserve_env LD_LIBRARY_PATH
 preserve_env LIBTOOL
 preserve_env MAKE
 preserve_env NM
 preserve_env OBJCFLAGS
 preserve_env OBJCOPY
 preserve_env PATH
 preserve_env PKG_CONFIG
--- a/contrib/vhost-user-blk/meson.build
+++ b/contrib/vhost-user-blk/meson.build
@ -1,5 +1,4 @@
 # FIXME: broken on 32-bit architectures
 executable('vhost-user-blk', files('vhost-user-blk.c'),
           dependencies: [qemuutil, vhost_user],
-           build_by_default: false,
+           build_by_default: targetos == 'linux',
           install: false)
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@ -146,7 +146,7 @@ vub_readv(VubReq *req, struct iovec *iov, uint32_t iovcnt)
    req->size = vub_iov_size(iov, iovcnt);
    rc = preadv(vdev_blk->blk_fd, iov, iovcnt, req->sector_num * 512);
    if (rc < 0) {
-        fprintf(stderr, "%s, Sector %"PRIu64", Size %lu failed with %s\n",
+        fprintf(stderr, "%s, Sector %"PRIu64", Size %zu failed with %s\n",
                vdev_blk->blk_name, req->sector_num, req->size,
                strerror(errno));
        return -1;
@ -169,7 +169,7 @@ vub_writev(VubReq *req, struct iovec *iov, uint32_t iovcnt)
    req->size = vub_iov_size(iov, iovcnt);
    rc = pwritev(vdev_blk->blk_fd, iov, iovcnt, req->sector_num * 512);
    if (rc < 0) {
-        fprintf(stderr, "%s, Sector %"PRIu64", Size %lu failed with %s\n",
+        fprintf(stderr, "%s, Sector %"PRIu64", Size %zu failed with %s\n",
                vdev_blk->blk_name, req->sector_num, req->size,
                strerror(errno));
        return -1;
@ -188,7 +188,7 @@ vub_discard_write_zeroes(VubReq *req, struct iovec *iov, uint32_t iovcnt,
    size = vub_iov_size(iov, iovcnt);
    if (size != sizeof(*desc)) {
-        fprintf(stderr, "Invalid size %ld, expect %ld\n", size, sizeof(*desc));
+        fprintf(stderr, "Invalid size %zd, expect %zd\n", size, sizeof(*desc));
        return -1;
    }
    buf = g_new0(char, size);
--- a/cpu.c
+++ b/cpu.c
@ -70,15 +70,21 @@ struct libafl_hook {
 struct libafl_hook* libafl_qemu_hooks[LIBAFL_TABLES_SIZE];
 size_t libafl_qemu_hooks_num = 0;
-__thread CPUArchState *libafl_qemu_env;
+__thread int libafl_valid_current_cpu = 0;
 void libafl_helper_table_add(TCGHelperInfo* info);
-static GByteArray *libafl_qemu_mem_buf = NULL;
+static __thread GByteArray *libafl_qemu_mem_buf = NULL;
 CPUState* libafl_qemu_get_cpu(int cpu_index);
 int libafl_qemu_num_cpus(void);
 CPUState* libafl_qemu_current_cpu(void);
 int libafl_qemu_cpu_index(CPUState*);
 int libafl_qemu_write_reg(CPUState* cpu, int reg, uint8_t* val);
 int libafl_qemu_read_reg(CPUState* cpu, int reg, uint8_t* val);
 int libafl_qemu_num_regs(CPUState* cpu);
 int libafl_qemu_write_reg(int reg, uint8_t* val);
 int libafl_qemu_read_reg(int reg, uint8_t* val);
 int libafl_qemu_num_regs(void);
 int libafl_qemu_set_breakpoint(target_ulong addr);
 int libafl_qemu_remove_breakpoint(target_ulong addr);
 size_t libafl_qemu_set_hook(target_ulong pc, void (*callback)(target_ulong, uint64_t),
@ -88,16 +94,54 @@ int libafl_qemu_remove_hook(size_t num, int invalidate);
 struct libafl_hook* libafl_search_hook(target_ulong addr);
 void libafl_flush_jit(void);
-int libafl_qemu_write_reg(int reg, uint8_t* val)
+/*
-{
+void* libafl_qemu_g2h(CPUState *cpu, target_ulong x);
-    CPUState *cpu = current_cpu;
+target_ulong libafl_qemu_h2g(CPUState *cpu, void* x);
    if (!cpu) {
        cpu = env_cpu(libafl_qemu_env);
        if (!cpu) {
            return 0;
        }
    }
 void* libafl_qemu_g2h(CPUState *cpu, target_ulong x)
 {
    return g2h(cpu, x);
 }
 target_ulong libafl_qemu_h2g(CPUState *cpu, void* x)
 {
    return h2g(cpu, x);
 }
 */
 CPUState* libafl_qemu_get_cpu(int cpu_index)
 {
    CPUState *cpu;
    CPU_FOREACH(cpu) {
        if (cpu->cpu_index == cpu_index)
            return cpu;
    }
    return NULL;
 }
 int libafl_qemu_num_cpus(void)
 {
    CPUState *cpu;
    int num = 0;
    CPU_FOREACH(cpu) {
        num++;
    }
    return num;
 }
 CPUState* libafl_qemu_current_cpu(void)
 {
    return current_cpu;
 }
 int libafl_qemu_cpu_index(CPUState* cpu)
 {
    if (cpu) return cpu->cpu_index;
    return -1;
 }
 int libafl_qemu_write_reg(CPUState* cpu, int reg, uint8_t* val)
 {
    CPUClass *cc = CPU_GET_CLASS(cpu);
    if (reg < cc->gdb_num_core_regs) {
        return cc->gdb_write_register(cpu, val, reg);
@ -105,16 +149,8 @@ int libafl_qemu_write_reg(int reg, uint8_t* val)
    return 0;
 }
-int libafl_qemu_read_reg(int reg, uint8_t* val)
+int libafl_qemu_read_reg(CPUState* cpu, int reg, uint8_t* val)
 {
    CPUState *cpu = current_cpu;
    if (!cpu) {
        cpu = env_cpu(libafl_qemu_env);
        if (!cpu) {
            return 0;
        }
    }
    if (libafl_qemu_mem_buf == NULL) {
        libafl_qemu_mem_buf = g_byte_array_sized_new(64);
    }
@ -131,16 +167,8 @@ int libafl_qemu_read_reg(int reg, uint8_t* val)
    return 0;
 }
-int libafl_qemu_num_regs(void)
+int libafl_qemu_num_regs(CPUState* cpu)
 {
    CPUState *cpu = current_cpu;
    if (!cpu) {
        cpu = env_cpu(libafl_qemu_env);
        if (!cpu) {
            return 0;
        }
    }
    CPUClass *cc = CPU_GET_CLASS(cpu);
    return cc->gdb_num_core_regs;
 }
--- a/cpus-common.c
+++ b/cpus-common.c
@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
 }
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
 static unsigned int cpu_list_generation_id;
 unsigned int cpu_list_generation_id_get(void)
 {
    return cpu_list_generation_id;
 }
 void cpu_list_add(CPUState *cpu)
 {
@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
        assert(!cpu_index_auto_assigned);
    }
    QTAILQ_INSERT_TAIL_RCU(&cpus, cpu, node);
    cpu_list_generation_id++;
 }
 void cpu_list_remove(CPUState *cpu)
@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
    QTAILQ_REMOVE_RCU(&cpus, cpu, node);
    cpu->cpu_index = UNASSIGNED_CPU_INDEX;
    cpu_list_generation_id++;
 }
 CPUState *qemu_get_cpu(int index)
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@ -495,7 +495,7 @@ qcrypto_block_luks_load_header(QCryptoBlock *block,
                                void *opaque,
                                Error **errp)
 {
-    ssize_t rv;
+    int rv;
    size_t i;
    QCryptoBlockLUKS *luks = block->opaque;
@ -856,7 +856,7 @@ qcrypto_block_luks_store_key(QCryptoBlock *block,
                  QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
                  splitkey, splitkeylen,
                  opaque,
-                  errp) != splitkeylen) {
+                  errp) < 0) {
        goto cleanup;
    }
@ -903,7 +903,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
    g_autofree uint8_t *splitkey = NULL;
    size_t splitkeylen;
    g_autofree uint8_t *possiblekey = NULL;
-    ssize_t rv;
+    int rv;
    g_autoptr(QCryptoCipher) cipher = NULL;
    uint8_t keydigest[QCRYPTO_BLOCK_LUKS_DIGEST_LEN];
    g_autoptr(QCryptoIVGen) ivgen = NULL;
@ -1193,7 +1193,7 @@ qcrypto_block_luks_erase_key(QCryptoBlock *block,
                      garbagesplitkey,
                      splitkeylen,
                      opaque,
-                      &local_err) != splitkeylen) {
+                      &local_err) < 0) {
            error_propagate(errp, local_err);
            return -1;
        }
--- a/crypto/block.c
+++ b/crypto/block.c
@ -115,7 +115,7 @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions *options,
 }
-static ssize_t qcrypto_block_headerlen_hdr_init_func(QCryptoBlock *block,
+static int qcrypto_block_headerlen_hdr_init_func(QCryptoBlock *block,
        size_t headerlen, void *opaque, Error **errp)
 {
    size_t *headerlenp = opaque;
@ -126,12 +126,12 @@ static ssize_t qcrypto_block_headerlen_hdr_init_func(QCryptoBlock *block,
 }
-static ssize_t qcrypto_block_headerlen_hdr_write_func(QCryptoBlock *block,
+static int qcrypto_block_headerlen_hdr_write_func(QCryptoBlock *block,
        size_t offset, const uint8_t *buf, size_t buflen,
        void *opaque, Error **errp)
 {
    /* Discard the bytes, we're not actually writing to an image */
-    return buflen;
+    return 0;
 }
--- a/disas.c
+++ b/disas.c
@ -178,9 +178,6 @@ static void initialize_debug_host(CPUDebug *s)
 #endif
 #elif defined(__aarch64__)
    s->info.cap_arch = CS_ARCH_ARM64;
 # ifdef CONFIG_ARM_A64_DIS
    s->info.print_insn = print_insn_arm_a64;
 # endif
 #elif defined(__alpha__)
    s->info.print_insn = print_insn_alpha;
 #elif defined(__sparc__)
--- a/disas/arm-a64.cc
+++ b/disas/arm-a64.cc
@ -1,101 +0,0 @@
 /*
 * ARM A64 disassembly output wrapper to libvixl
 * Copyright (c) 2013 Linaro Limited
 * Written by Claudio Fontana
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include "qemu/osdep.h"
 #include "disas/dis-asm.h"
 #include "vixl/a64/disasm-a64.h"
 using namespace vixl;
 static Decoder *vixl_decoder = NULL;
 static Disassembler *vixl_disasm = NULL;
 /* We don't use libvixl's PrintDisassembler because its output
 * is a little unhelpful (trailing newlines, for example).
 * Instead we use our own very similar variant so we have
 * control over the format.
 */
 class QEMUDisassembler : public Disassembler {
 public:
    QEMUDisassembler() : printf_(NULL), stream_(NULL) { }
    ~QEMUDisassembler() { }
    void SetStream(FILE *stream) {
        stream_ = stream;
    }
    void SetPrintf(fprintf_function printf_fn) {
        printf_ = printf_fn;
    }
 protected:
    virtual void ProcessOutput(const Instruction *instr) {
        printf_(stream_, "%08" PRIx32 "      %s",
                instr->InstructionBits(), GetOutput());
    }
 private:
    fprintf_function printf_;
    FILE *stream_;
 };
 static int vixl_is_initialized(void)
 {
    return vixl_decoder != NULL;
 }
 static void vixl_init() {
    vixl_decoder = new Decoder();
    vixl_disasm = new QEMUDisassembler();
    vixl_decoder->AppendVisitor(vixl_disasm);
 }
 #define INSN_SIZE 4
 /* Disassemble ARM A64 instruction. This is our only entry
 * point from QEMU's C code.
 */
 int print_insn_arm_a64(uint64_t addr, disassemble_info *info)
 {
    uint8_t bytes[INSN_SIZE];
    uint32_t instrval;
    const Instruction *instr;
    int status;
    status = info->read_memory_func(addr, bytes, INSN_SIZE, info);
    if (status != 0) {
        info->memory_error_func(status, addr, info);
        return -1;
    }
    if (!vixl_is_initialized()) {
        vixl_init();
    }
    ((QEMUDisassembler *)vixl_disasm)->SetPrintf(info->fprintf_func);
    ((QEMUDisassembler *)vixl_disasm)->SetStream(info->stream);
    instrval = bytes[0] | bytes[1] << 8 | bytes[2] << 16 | bytes[3] << 24;
    instr = reinterpret_cast<const Instruction *>(&instrval);
    vixl_disasm->MapCodeAddress(addr, instr);
    vixl_decoder->Decode(instr);
    return INSN_SIZE;
 }
--- a/disas/libvixl/LICENCE
+++ b/disas/libvixl/LICENCE
@ -1,30 +0,0 @@
 LICENCE
 =======
 The software in this repository is covered by the following licence.
 // Copyright 2013, ARM Limited
 // All rights reserved.
 //
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are met:
 //
 //   * Redistributions of source code must retain the above copyright notice,
 //     this list of conditions and the following disclaimer.
 //   * Redistributions in binary form must reproduce the above copyright notice,
 //     this list of conditions and the following disclaimer in the documentation
 //     and/or other materials provided with the distribution.
 //   * Neither the name of ARM Limited nor the names of its contributors may be
 //     used to endorse or promote products derived from this software without
 //     specific prior written permission.
 //
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS CONTRIBUTORS "AS IS" AND
 // ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 // WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 // DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
 // FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 // DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 // SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 // OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/disas/libvixl/README
+++ b/disas/libvixl/README
@ -1,11 +0,0 @@
 The code in this directory is a subset of libvixl:
 https://github.com/armvixl/vixl
 (specifically, it is the set of files needed for disassembly only,
 taken from libvixl 1.12).
 Bugfixes should preferably be sent upstream initially.
 The disassembler does not currently support the entire A64 instruction
 set. Notably:
 * Limited support for system instructions.
 * A few miscellaneous integer and floating point instructions are missing.
--- a/disas/libvixl/meson.build
+++ b/disas/libvixl/meson.build
@ -1,7 +0,0 @@
 libvixl_ss.add(files(
  'vixl/a64/decoder-a64.cc',
  'vixl/a64/disasm-a64.cc',
  'vixl/a64/instructions-a64.cc',
  'vixl/compiler-intrinsics.cc',
  'vixl/utils.cc',
 ))
--- a/disas/libvixl/vixl/a64/assembler-a64.h
+++ b/disas/libvixl/vixl/a64/assembler-a64.h
--- a/disas/libvixl/vixl/a64/constants-a64.h
+++ b/disas/libvixl/vixl/a64/constants-a64.h
--- a/disas/libvixl/vixl/a64/cpu-a64.h
+++ b/disas/libvixl/vixl/a64/cpu-a64.h
@ -1,83 +0,0 @@
 // Copyright 2014, ARM Limited
 // All rights reserved.
 //
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are met:
 //
 //   * Redistributions of source code must retain the above copyright notice,
 //     this list of conditions and the following disclaimer.
 //   * Redistributions in binary form must reproduce the above copyright notice,
 //     this list of conditions and the following disclaimer in the documentation
 //     and/or other materials provided with the distribution.
 //   * Neither the name of ARM Limited nor the names of its contributors may be
 //     used to endorse or promote products derived from this software without
 //     specific prior written permission.
 //
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS CONTRIBUTORS "AS IS" AND
 // ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 // WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 // DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
 // FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 // DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 // SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 // OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #ifndef VIXL_CPU_A64_H
 #define VIXL_CPU_A64_H
 #include "vixl/globals.h"
 #include "vixl/a64/instructions-a64.h"
 namespace vixl {
 class CPU {
 public:
  // Initialise CPU support.
  static void SetUp();
  // Ensures the data at a given address and with a given size is the same for
  // the I and D caches. I and D caches are not automatically coherent on ARM
  // so this operation is required before any dynamically generated code can
  // safely run.
  static void EnsureIAndDCacheCoherency(void *address, size_t length);
  // Handle tagged pointers.
  template <typename T>
  static T SetPointerTag(T pointer, uint64_t tag) {
    VIXL_ASSERT(is_uintn(kAddressTagWidth, tag));
    // Use C-style casts to get static_cast behaviour for integral types (T),
    // and reinterpret_cast behaviour for other types.
    uint64_t raw = (uint64_t)pointer;
    VIXL_STATIC_ASSERT(sizeof(pointer) == sizeof(raw));
    raw = (raw & ~kAddressTagMask) | (tag << kAddressTagOffset);
    return (T)raw;
  }
  template <typename T>
  static uint64_t GetPointerTag(T pointer) {
    // Use C-style casts to get static_cast behaviour for integral types (T),
    // and reinterpret_cast behaviour for other types.
    uint64_t raw = (uint64_t)pointer;
    VIXL_STATIC_ASSERT(sizeof(pointer) == sizeof(raw));
    return (raw & kAddressTagMask) >> kAddressTagOffset;
  }
 private:
  // Return the content of the cache type register.
  static uint32_t GetCacheType();
  // I and D cache line size in bytes.
  static unsigned icache_line_size_;
  static unsigned dcache_line_size_;
 };
 }  // namespace vixl
 #endif  // VIXL_CPU_A64_H
--- a/disas/libvixl/vixl/a64/decoder-a64.cc
+++ b/disas/libvixl/vixl/a64/decoder-a64.cc
@ -1,877 +0,0 @@
 // Copyright 2014, ARM Limited
 // All rights reserved.
 //
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are met:
 //
 //   * Redistributions of source code must retain the above copyright notice,
 //     this list of conditions and the following disclaimer.
 //   * Redistributions in binary form must reproduce the above copyright notice,
 //     this list of conditions and the following disclaimer in the documentation
 //     and/or other materials provided with the distribution.
 //   * Neither the name of ARM Limited nor the names of its contributors may be
 //     used to endorse or promote products derived from this software without
 //     specific prior written permission.
 //
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS CONTRIBUTORS "AS IS" AND
 // ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 // WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 // DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
 // FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 // DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 // SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 // OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #include "vixl/globals.h"
 #include "vixl/utils.h"
 #include "vixl/a64/decoder-a64.h"
 namespace vixl {
 void Decoder::DecodeInstruction(const Instruction *instr) {
  if (instr->Bits(28, 27) == 0) {
    VisitUnallocated(instr);
  } else {
    switch (instr->Bits(27, 24)) {
      // 0:   PC relative addressing.
      case 0x0: DecodePCRelAddressing(instr); break;
      // 1:   Add/sub immediate.
      case 0x1: DecodeAddSubImmediate(instr); break;
      // A:   Logical shifted register.
      //      Add/sub with carry.
      //      Conditional compare register.
      //      Conditional compare immediate.
      //      Conditional select.
      //      Data processing 1 source.
      //      Data processing 2 source.
      // B:   Add/sub shifted register.
      //      Add/sub extended register.
      //      Data processing 3 source.
      case 0xA:
      case 0xB: DecodeDataProcessing(instr); break;
      // 2:   Logical immediate.
      //      Move wide immediate.
      case 0x2: DecodeLogical(instr); break;
      // 3:   Bitfield.
      //      Extract.
      case 0x3: DecodeBitfieldExtract(instr); break;
      // 4:   Unconditional branch immediate.
      //      Exception generation.
      //      Compare and branch immediate.
      // 5:   Compare and branch immediate.
      //      Conditional branch.
      //      System.
      // 6,7: Unconditional branch.
      //      Test and branch immediate.
      case 0x4:
      case 0x5:
      case 0x6:
      case 0x7: DecodeBranchSystemException(instr); break;
      // 8,9: Load/store register pair post-index.
      //      Load register literal.
      //      Load/store register unscaled immediate.
      //      Load/store register immediate post-index.
      //      Load/store register immediate pre-index.
      //      Load/store register offset.
      //      Load/store exclusive.
      // C,D: Load/store register pair offset.
      //      Load/store register pair pre-index.
      //      Load/store register unsigned immediate.
      //      Advanced SIMD.
      case 0x8:
      case 0x9:
      case 0xC:
      case 0xD: DecodeLoadStore(instr); break;
      // E:   FP fixed point conversion.
      //      FP integer conversion.
      //      FP data processing 1 source.
      //      FP compare.
      //      FP immediate.
      //      FP data processing 2 source.
      //      FP conditional compare.
      //      FP conditional select.
      //      Advanced SIMD.
      // F:   FP data processing 3 source.
      //      Advanced SIMD.
      case 0xE:
      case 0xF: DecodeFP(instr); break;
    }
  }
 }
 void Decoder::AppendVisitor(DecoderVisitor* new_visitor) {
  visitors_.push_back(new_visitor);
 }
 void Decoder::PrependVisitor(DecoderVisitor* new_visitor) {
  visitors_.push_front(new_visitor);
 }
 void Decoder::InsertVisitorBefore(DecoderVisitor* new_visitor,
                                  DecoderVisitor* registered_visitor) {
  std::list<DecoderVisitor*>::iterator it;
  for (it = visitors_.begin(); it != visitors_.end(); it++) {
    if (*it == registered_visitor) {
      visitors_.insert(it, new_visitor);
      return;
    }
  }
  // We reached the end of the list. The last element must be
  // registered_visitor.
  VIXL_ASSERT(*it == registered_visitor);
  visitors_.insert(it, new_visitor);
 }
 void Decoder::InsertVisitorAfter(DecoderVisitor* new_visitor,
                                 DecoderVisitor* registered_visitor) {
  std::list<DecoderVisitor*>::iterator it;
  for (it = visitors_.begin(); it != visitors_.end(); it++) {
    if (*it == registered_visitor) {
      it++;
      visitors_.insert(it, new_visitor);
      return;
    }
  }
  // We reached the end of the list. The last element must be
  // registered_visitor.
  VIXL_ASSERT(*it == registered_visitor);
  visitors_.push_back(new_visitor);
 }
 void Decoder::RemoveVisitor(DecoderVisitor* visitor) {
  visitors_.remove(visitor);
 }
 void Decoder::DecodePCRelAddressing(const Instruction* instr) {
  VIXL_ASSERT(instr->Bits(27, 24) == 0x0);
  // We know bit 28 is set, as <b28:b27> = 0 is filtered out at the top level
  // decode.
  VIXL_ASSERT(instr->Bit(28) == 0x1);
  VisitPCRelAddressing(instr);
 }
 void Decoder::DecodeBranchSystemException(const Instruction* instr) {
  VIXL_ASSERT((instr->Bits(27, 24) == 0x4) ||
              (instr->Bits(27, 24) == 0x5) ||
              (instr->Bits(27, 24) == 0x6) ||
              (instr->Bits(27, 24) == 0x7) );
  switch (instr->Bits(31, 29)) {
    case 0:
    case 4: {
      VisitUnconditionalBranch(instr);
      break;
    }
    case 1:
    case 5: {
      if (instr->Bit(25) == 0) {
        VisitCompareBranch(instr);
      } else {
        VisitTestBranch(instr);
      }
      break;
    }
    case 2: {
      if (instr->Bit(25) == 0) {
        if ((instr->Bit(24) == 0x1) ||
            (instr->Mask(0x01000010) == 0x00000010)) {
          VisitUnallocated(instr);
        } else {
          VisitConditionalBranch(instr);
        }
      } else {
        VisitUnallocated(instr);
      }
      break;
    }
    case 6: {
      if (instr->Bit(25) == 0) {
        if (instr->Bit(24) == 0) {
          if ((instr->Bits(4, 2) != 0) ||
              (instr->Mask(0x00E0001D) == 0x00200001) ||
              (instr->Mask(0x00E0001D) == 0x00400001) ||
              (instr->Mask(0x00E0001E) == 0x00200002) ||
              (instr->Mask(0x00E0001E) == 0x00400002) ||
              (instr->Mask(0x00E0001C) == 0x00600000) ||
              (instr->Mask(0x00E0001C) == 0x00800000) ||
              (instr->Mask(0x00E0001F) == 0x00A00000) ||
              (instr->Mask(0x00C0001C) == 0x00C00000)) {
            VisitUnallocated(instr);
          } else {
            VisitException(instr);
          }
        } else {
          if (instr->Bits(23, 22) == 0) {
            const Instr masked_003FF0E0 = instr->Mask(0x003FF0E0);
            if ((instr->Bits(21, 19) == 0x4) ||
                (masked_003FF0E0 == 0x00033000) ||
                (masked_003FF0E0 == 0x003FF020) ||
                (masked_003FF0E0 == 0x003FF060) ||
                (masked_003FF0E0 == 0x003FF0E0) ||
                (instr->Mask(0x00388000) == 0x00008000) ||
                (instr->Mask(0x0038E000) == 0x00000000) ||
                (instr->Mask(0x0039E000) == 0x00002000) ||
                (instr->Mask(0x003AE000) == 0x00002000) ||
                (instr->Mask(0x003CE000) == 0x00042000) ||
                (instr->Mask(0x003FFFC0) == 0x000320C0) ||
                (instr->Mask(0x003FF100) == 0x00032100) ||
                (instr->Mask(0x003FF200) == 0x00032200) ||
                (instr->Mask(0x003FF400) == 0x00032400) ||
                (instr->Mask(0x003FF800) == 0x00032800) ||
                (instr->Mask(0x0038F000) == 0x00005000) ||
                (instr->Mask(0x0038E000) == 0x00006000)) {
              VisitUnallocated(instr);
            } else {
              VisitSystem(instr);
            }
          } else {
            VisitUnallocated(instr);
          }
        }
      } else {
        if ((instr->Bit(24) == 0x1) ||
            (instr->Bits(20, 16) != 0x1F) ||
            (instr->Bits(15, 10) != 0) ||
            (instr->Bits(4, 0) != 0) ||
            (instr->Bits(24, 21) == 0x3) ||
            (instr->Bits(24, 22) == 0x3)) {
          VisitUnallocated(instr);
        } else {
          VisitUnconditionalBranchToRegister(instr);
        }
      }
      break;
    }
    case 3:
    case 7: {
      VisitUnallocated(instr);
      break;
    }
  }
 }
 void Decoder::DecodeLoadStore(const Instruction* instr) {
  VIXL_ASSERT((instr->Bits(27, 24) == 0x8) ||
              (instr->Bits(27, 24) == 0x9) ||
              (instr->Bits(27, 24) == 0xC) ||
              (instr->Bits(27, 24) == 0xD) );
  // TODO(all): rearrange the tree to integrate this branch.
  if ((instr->Bit(28) == 0) && (instr->Bit(29) == 0) && (instr->Bit(26) == 1)) {
    DecodeNEONLoadStore(instr);
    return;
  }
  if (instr->Bit(24) == 0) {
    if (instr->Bit(28) == 0) {
      if (instr->Bit(29) == 0) {
        if (instr->Bit(26) == 0) {
          VisitLoadStoreExclusive(instr);
        } else {
          VIXL_UNREACHABLE();
        }
      } else {
        if ((instr->Bits(31, 30) == 0x3) ||
            (instr->Mask(0xC4400000) == 0x40000000)) {
          VisitUnallocated(instr);
        } else {
          if (instr->Bit(23) == 0) {
            if (instr->Mask(0xC4400000) == 0xC0400000) {
              VisitUnallocated(instr);
            } else {
              VisitLoadStorePairNonTemporal(instr);
            }
          } else {
            VisitLoadStorePairPostIndex(instr);
          }
        }
      }
    } else {
      if (instr->Bit(29) == 0) {
        if (instr->Mask(0xC4000000) == 0xC4000000) {
          VisitUnallocated(instr);
        } else {
          VisitLoadLiteral(instr);
        }
      } else {
        if ((instr->Mask(0x84C00000) == 0x80C00000) ||
            (instr->Mask(0x44800000) == 0x44800000) ||
            (instr->Mask(0x84800000) == 0x84800000)) {
          VisitUnallocated(instr);
        } else {
          if (instr->Bit(21) == 0) {
            switch (instr->Bits(11, 10)) {
              case 0: {
                VisitLoadStoreUnscaledOffset(instr);
                break;
              }
              case 1: {
                if (instr->Mask(0xC4C00000) == 0xC0800000) {
                  VisitUnallocated(instr);
                } else {
                  VisitLoadStorePostIndex(instr);
                }
                break;
              }
              case 2: {
                // TODO: VisitLoadStoreRegisterOffsetUnpriv.
                VisitUnimplemented(instr);
                break;
              }
              case 3: {
                if (instr->Mask(0xC4C00000) == 0xC0800000) {
                  VisitUnallocated(instr);
                } else {
                  VisitLoadStorePreIndex(instr);
                }
                break;
              }
            }
          } else {
            if (instr->Bits(11, 10) == 0x2) {
              if (instr->Bit(14) == 0) {
                VisitUnallocated(instr);
              } else {
                VisitLoadStoreRegisterOffset(instr);
              }
            } else {
              VisitUnallocated(instr);
            }
          }
        }
      }
    }
  } else {
    if (instr->Bit(28) == 0) {
      if (instr->Bit(29) == 0) {
        VisitUnallocated(instr);
      } else {
        if ((instr->Bits(31, 30) == 0x3) ||
            (instr->Mask(0xC4400000) == 0x40000000)) {
          VisitUnallocated(instr);
        } else {
          if (instr->Bit(23) == 0) {
            VisitLoadStorePairOffset(instr);
          } else {
            VisitLoadStorePairPreIndex(instr);
          }
        }
      }
    } else {
      if (instr->Bit(29) == 0) {
        VisitUnallocated(instr);
      } else {
        if ((instr->Mask(0x84C00000) == 0x80C00000) ||
            (instr->Mask(0x44800000) == 0x44800000) ||
            (instr->Mask(0x84800000) == 0x84800000)) {
          VisitUnallocated(instr);
        } else {
          VisitLoadStoreUnsignedOffset(instr);
        }
      }
    }
  }
 }
 void Decoder::DecodeLogical(const Instruction* instr) {
  VIXL_ASSERT(instr->Bits(27, 24) == 0x2);
  if (instr->Mask(0x80400000) == 0x00400000) {
    VisitUnallocated(instr);
  } else {
    if (instr->Bit(23) == 0) {
      VisitLogicalImmediate(instr);
    } else {
      if (instr->Bits(30, 29) == 0x1) {
        VisitUnallocated(instr);
      } else {
        VisitMoveWideImmediate(instr);
      }
    }
  }
 }
 void Decoder::DecodeBitfieldExtract(const Instruction* instr) {
  VIXL_ASSERT(instr->Bits(27, 24) == 0x3);
  if ((instr->Mask(0x80400000) == 0x80000000) ||
      (instr->Mask(0x80400000) == 0x00400000) ||
      (instr->Mask(0x80008000) == 0x00008000)) {
    VisitUnallocated(instr);
  } else if (instr->Bit(23) == 0) {
    if ((instr->Mask(0x80200000) == 0x00200000) ||
        (instr->Mask(0x60000000) == 0x60000000)) {
      VisitUnallocated(instr);
    } else {
      VisitBitfield(instr);
    }
  } else {
    if ((instr->Mask(0x60200000) == 0x00200000) ||
        (instr->Mask(0x60000000) != 0x00000000)) {
      VisitUnallocated(instr);
    } else {
      VisitExtract(instr);
    }
  }
 }
 void Decoder::DecodeAddSubImmediate(const Instruction* instr) {
  VIXL_ASSERT(instr->Bits(27, 24) == 0x1);
  if (instr->Bit(23) == 1) {
    VisitUnallocated(instr);
  } else {
    VisitAddSubImmediate(instr);
  }
 }
 void Decoder::DecodeDataProcessing(const Instruction* instr) {
  VIXL_ASSERT((instr->Bits(27, 24) == 0xA) ||
              (instr->Bits(27, 24) == 0xB));
  if (instr->Bit(24) == 0) {
    if (instr->Bit(28) == 0) {
      if (instr->Mask(0x80008000) == 0x00008000) {
        VisitUnallocated(instr);
      } else {
        VisitLogicalShifted(instr);
      }
    } else {
      switch (instr->Bits(23, 21)) {
        case 0: {
          if (instr->Mask(0x0000FC00) != 0) {
            VisitUnallocated(instr);
          } else {
            VisitAddSubWithCarry(instr);
          }
          break;
        }
        case 2: {
          if ((instr->Bit(29) == 0) ||
              (instr->Mask(0x00000410) != 0)) {
            VisitUnallocated(instr);
          } else {
            if (instr->Bit(11) == 0) {
              VisitConditionalCompareRegister(instr);
            } else {
              VisitConditionalCompareImmediate(instr);
            }
          }
          break;
        }
        case 4: {
          if (instr->Mask(0x20000800) != 0x00000000) {
            VisitUnallocated(instr);
          } else {
            VisitConditionalSelect(instr);
          }
          break;
        }
        case 6: {
          if (instr->Bit(29) == 0x1) {
            VisitUnallocated(instr);
            VIXL_FALLTHROUGH();
          } else {
            if (instr->Bit(30) == 0) {
              if ((instr->Bit(15) == 0x1) ||
                  (instr->Bits(15, 11) == 0) ||
                  (instr->Bits(15, 12) == 0x1) ||
                  (instr->Bits(15, 12) == 0x3) ||
                  (instr->Bits(15, 13) == 0x3) ||
                  (instr->Mask(0x8000EC00) == 0x00004C00) ||
                  (instr->Mask(0x8000E800) == 0x80004000) ||
                  (instr->Mask(0x8000E400) == 0x80004000)) {
                VisitUnallocated(instr);
              } else {
                VisitDataProcessing2Source(instr);
              }
            } else {
              if ((instr->Bit(13) == 1) ||
                  (instr->Bits(20, 16) != 0) ||
                  (instr->Bits(15, 14) != 0) ||
                  (instr->Mask(0xA01FFC00) == 0x00000C00) ||
                  (instr->Mask(0x201FF800) == 0x00001800)) {
                VisitUnallocated(instr);
              } else {
                VisitDataProcessing1Source(instr);
              }
            }
            break;
          }
        }
        case 1:
        case 3:
        case 5:
        case 7: VisitUnallocated(instr); break;
      }
    }
  } else {
    if (instr->Bit(28) == 0) {
     if (instr->Bit(21) == 0) {
        if ((instr->Bits(23, 22) == 0x3) ||
            (instr->Mask(0x80008000) == 0x00008000)) {
          VisitUnallocated(instr);
        } else {
          VisitAddSubShifted(instr);
        }
      } else {
        if ((instr->Mask(0x00C00000) != 0x00000000) ||
            (instr->Mask(0x00001400) == 0x00001400) ||
            (instr->Mask(0x00001800) == 0x00001800)) {
          VisitUnallocated(instr);
        } else {
          VisitAddSubExtended(instr);
        }
      }
    } else {
      if ((instr->Bit(30) == 0x1) ||
          (instr->Bits(30, 29) == 0x1) ||
          (instr->Mask(0xE0600000) == 0x00200000) ||
          (instr->Mask(0xE0608000) == 0x00400000) ||
          (instr->Mask(0x60608000) == 0x00408000) ||
          (instr->Mask(0x60E00000) == 0x00E00000) ||
          (instr->Mask(0x60E00000) == 0x00800000) ||
          (instr->Mask(0x60E00000) == 0x00600000)) {
        VisitUnallocated(instr);
      } else {
        VisitDataProcessing3Source(instr);
      }
    }
  }
 }
 void Decoder::DecodeFP(const Instruction* instr) {
  VIXL_ASSERT((instr->Bits(27, 24) == 0xE) ||
              (instr->Bits(27, 24) == 0xF));
  if (instr->Bit(28) == 0) {
    DecodeNEONVectorDataProcessing(instr);
  } else {
    if (instr->Bits(31, 30) == 0x3) {
      VisitUnallocated(instr);
    } else if (instr->Bits(31, 30) == 0x1) {
      DecodeNEONScalarDataProcessing(instr);
    } else {
      if (instr->Bit(29) == 0) {
        if (instr->Bit(24) == 0) {
          if (instr->Bit(21) == 0) {
            if ((instr->Bit(23) == 1) ||
                (instr->Bit(18) == 1) ||
                (instr->Mask(0x80008000) == 0x00000000) ||
                (instr->Mask(0x000E0000) == 0x00000000) ||
                (instr->Mask(0x000E0000) == 0x000A0000) ||
                (instr->Mask(0x00160000) == 0x00000000) ||
                (instr->Mask(0x00160000) == 0x00120000)) {
              VisitUnallocated(instr);
            } else {
              VisitFPFixedPointConvert(instr);
            }
          } else {
            if (instr->Bits(15, 10) == 32) {
              VisitUnallocated(instr);
            } else if (instr->Bits(15, 10) == 0) {
              if ((instr->Bits(23, 22) == 0x3) ||
                  (instr->Mask(0x000E0000) == 0x000A0000) ||
                  (instr->Mask(0x000E0000) == 0x000C0000) ||
                  (instr->Mask(0x00160000) == 0x00120000) ||
                  (instr->Mask(0x00160000) == 0x00140000) ||
                  (instr->Mask(0x20C40000) == 0x00800000) ||
                  (instr->Mask(0x20C60000) == 0x00840000) ||
                  (instr->Mask(0xA0C60000) == 0x80060000) ||
                  (instr->Mask(0xA0C60000) == 0x00860000) ||
                  (instr->Mask(0xA0C60000) == 0x00460000) ||
                  (instr->Mask(0xA0CE0000) == 0x80860000) ||
                  (instr->Mask(0xA0CE0000) == 0x804E0000) ||
                  (instr->Mask(0xA0CE0000) == 0x000E0000) ||
                  (instr->Mask(0xA0D60000) == 0x00160000) ||
                  (instr->Mask(0xA0D60000) == 0x80560000) ||
                  (instr->Mask(0xA0D60000) == 0x80960000)) {
                VisitUnallocated(instr);
              } else {
                VisitFPIntegerConvert(instr);
              }
            } else if (instr->Bits(14, 10) == 16) {
              const Instr masked_A0DF8000 = instr->Mask(0xA0DF8000);
              if ((instr->Mask(0x80180000) != 0) ||
                  (masked_A0DF8000 == 0x00020000) ||
                  (masked_A0DF8000 == 0x00030000) ||
                  (masked_A0DF8000 == 0x00068000) ||
                  (masked_A0DF8000 == 0x00428000) ||
                  (masked_A0DF8000 == 0x00430000) ||
                  (masked_A0DF8000 == 0x00468000) ||
                  (instr->Mask(0xA0D80000) == 0x00800000) ||
                  (instr->Mask(0xA0DE0000) == 0x00C00000) ||
                  (instr->Mask(0xA0DF0000) == 0x00C30000) ||
                  (instr->Mask(0xA0DC0000) == 0x00C40000)) {
                VisitUnallocated(instr);
              } else {
                VisitFPDataProcessing1Source(instr);
              }
            } else if (instr->Bits(13, 10) == 8) {
              if ((instr->Bits(15, 14) != 0) ||
                  (instr->Bits(2, 0) != 0) ||
                  (instr->Mask(0x80800000) != 0x00000000)) {
                VisitUnallocated(instr);
              } else {
                VisitFPCompare(instr);
              }
            } else if (instr->Bits(12, 10) == 4) {
              if ((instr->Bits(9, 5) != 0) ||
                  (instr->Mask(0x80800000) != 0x00000000)) {
                VisitUnallocated(instr);
              } else {
                VisitFPImmediate(instr);
              }
            } else {
              if (instr->Mask(0x80800000) != 0x00000000) {
                VisitUnallocated(instr);
              } else {
                switch (instr->Bits(11, 10)) {
                  case 1: {
                    VisitFPConditionalCompare(instr);
                    break;
                  }
                  case 2: {
                    if ((instr->Bits(15, 14) == 0x3) ||
                        (instr->Mask(0x00009000) == 0x00009000) ||
                        (instr->Mask(0x0000A000) == 0x0000A000)) {
                      VisitUnallocated(instr);
                    } else {
                      VisitFPDataProcessing2Source(instr);
                    }
                    break;
                  }
                  case 3: {
                    VisitFPConditionalSelect(instr);
                    break;
                  }
                  default: VIXL_UNREACHABLE();
                }
              }
            }
          }
        } else {
          // Bit 30 == 1 has been handled earlier.
          VIXL_ASSERT(instr->Bit(30) == 0);
          if (instr->Mask(0xA0800000) != 0) {
            VisitUnallocated(instr);
          } else {
            VisitFPDataProcessing3Source(instr);
          }
        }
      } else {
        VisitUnallocated(instr);
      }
    }
  }
 }
 void Decoder::DecodeNEONLoadStore(const Instruction* instr) {
  VIXL_ASSERT(instr->Bits(29, 25) == 0x6);
  if (instr->Bit(31) == 0) {
    if ((instr->Bit(24) == 0) && (instr->Bit(21) == 1)) {
      VisitUnallocated(instr);
      return;
    }
    if (instr->Bit(23) == 0) {
      if (instr->Bits(20, 16) == 0) {
        if (instr->Bit(24) == 0) {
          VisitNEONLoadStoreMultiStruct(instr);
        } else {
          VisitNEONLoadStoreSingleStruct(instr);
        }
      } else {
        VisitUnallocated(instr);
      }
    } else {
      if (instr->Bit(24) == 0) {
        VisitNEONLoadStoreMultiStructPostIndex(instr);
      } else {
        VisitNEONLoadStoreSingleStructPostIndex(instr);
      }
    }
  } else {
    VisitUnallocated(instr);
  }
 }
 void Decoder::DecodeNEONVectorDataProcessing(const Instruction* instr) {
  VIXL_ASSERT(instr->Bits(28, 25) == 0x7);
  if (instr->Bit(31) == 0) {
    if (instr->Bit(24) == 0) {
      if (instr->Bit(21) == 0) {
        if (instr->Bit(15) == 0) {
          if (instr->Bit(10) == 0) {
            if (instr->Bit(29) == 0) {
              if (instr->Bit(11) == 0) {
                VisitNEONTable(instr);
              } else {
                VisitNEONPerm(instr);
              }
            } else {
              VisitNEONExtract(instr);
            }
          } else {
            if (instr->Bits(23, 22) == 0) {
              VisitNEONCopy(instr);
            } else {
              VisitUnallocated(instr);
            }
          }
        } else {
          VisitUnallocated(instr);
        }
      } else {
        if (instr->Bit(10) == 0) {
          if (instr->Bit(11) == 0) {
            VisitNEON3Different(instr);
          } else {
            if (instr->Bits(18, 17) == 0) {
              if (instr->Bit(20) == 0) {
                if (instr->Bit(19) == 0) {
                  VisitNEON2RegMisc(instr);
                } else {
                  if (instr->Bits(30, 29) == 0x2) {
                    VisitCryptoAES(instr);
                  } else {
                    VisitUnallocated(instr);
                  }
                }
              } else {
                if (instr->Bit(19) == 0) {
                  VisitNEONAcrossLanes(instr);
                } else {
                  VisitUnallocated(instr);
                }
              }
            } else {
              VisitUnallocated(instr);
            }
          }
        } else {
          VisitNEON3Same(instr);
        }
      }
    } else {
      if (instr->Bit(10) == 0) {
        VisitNEONByIndexedElement(instr);
      } else {
        if (instr->Bit(23) == 0) {
          if (instr->Bits(22, 19) == 0) {
            VisitNEONModifiedImmediate(instr);
          } else {
            VisitNEONShiftImmediate(instr);
          }
        } else {
          VisitUnallocated(instr);
        }
      }
    }
  } else {
    VisitUnallocated(instr);
  }
 }
 void Decoder::DecodeNEONScalarDataProcessing(const Instruction* instr) {
  VIXL_ASSERT(instr->Bits(28, 25) == 0xF);
  if (instr->Bit(24) == 0) {
    if (instr->Bit(21) == 0) {
      if (instr->Bit(15) == 0) {
        if (instr->Bit(10) == 0) {
          if (instr->Bit(29) == 0) {
            if (instr->Bit(11) == 0) {
              VisitCrypto3RegSHA(instr);
            } else {
              VisitUnallocated(instr);
            }
          } else {
            VisitUnallocated(instr);
          }
        } else {
          if (instr->Bits(23, 22) == 0) {
            VisitNEONScalarCopy(instr);
          } else {
            VisitUnallocated(instr);
          }
        }
      } else {
        VisitUnallocated(instr);
      }
    } else {
      if (instr->Bit(10) == 0) {
        if (instr->Bit(11) == 0) {
          VisitNEONScalar3Diff(instr);
        } else {
          if (instr->Bits(18, 17) == 0) {
            if (instr->Bit(20) == 0) {
              if (instr->Bit(19) == 0) {
                VisitNEONScalar2RegMisc(instr);
              } else {
                if (instr->Bit(29) == 0) {
                  VisitCrypto2RegSHA(instr);
                } else {
                  VisitUnallocated(instr);
                }
              }
            } else {
              if (instr->Bit(19) == 0) {
                VisitNEONScalarPairwise(instr);
              } else {
                VisitUnallocated(instr);
              }
            }
          } else {
            VisitUnallocated(instr);
          }
        }
      } else {
        VisitNEONScalar3Same(instr);
      }
    }
  } else {
    if (instr->Bit(10) == 0) {
      VisitNEONScalarByIndexedElement(instr);
    } else {
      if (instr->Bit(23) == 0) {
        VisitNEONScalarShiftImmediate(instr);
      } else {
        VisitUnallocated(instr);
      }
    }
  }
 }
 #define DEFINE_VISITOR_CALLERS(A)                                              \
  void Decoder::Visit##A(const Instruction *instr) {                           \
    VIXL_ASSERT(instr->Mask(A##FMask) == A##Fixed);                            \
    std::list<DecoderVisitor*>::iterator it;                                   \
    for (it = visitors_.begin(); it != visitors_.end(); it++) {                \
      (*it)->Visit##A(instr);                                                  \
    }                                                                          \
  }
 VISITOR_LIST(DEFINE_VISITOR_CALLERS)
 #undef DEFINE_VISITOR_CALLERS
 }  // namespace vixl
--- a/Show More
+++ b/Show More