Merge what is left of qemu-tech into the main manual as an appendix. Ultimately we should have a new internals manual built from docs/, and then the "Translator Internals" parts of qemu-tech could move to docs/ as well. The bits on limitation and features of CPU emulation should remain in qemu-doc. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
		
			
				
	
	
		
			372 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			372 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
@node Implementation notes
 | 
						|
@appendix Implementation notes
 | 
						|
 | 
						|
@menu
 | 
						|
* CPU emulation::
 | 
						|
* Translator Internals::
 | 
						|
* QEMU compared to other emulators::
 | 
						|
* Bibliography::
 | 
						|
@end menu
 | 
						|
 | 
						|
@node CPU emulation
 | 
						|
@section CPU emulation
 | 
						|
 | 
						|
@menu
 | 
						|
* x86::     x86 and x86-64 emulation
 | 
						|
* ARM::     ARM emulation
 | 
						|
* MIPS::    MIPS emulation
 | 
						|
* PPC::     PowerPC emulation
 | 
						|
* SPARC::   Sparc32 and Sparc64 emulation
 | 
						|
* Xtensa::  Xtensa emulation
 | 
						|
@end menu
 | 
						|
 | 
						|
@node x86
 | 
						|
@subsection x86 and x86-64 emulation
 | 
						|
 | 
						|
QEMU x86 target features:
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
 | 
						|
LDT/GDT and IDT are emulated. VM86 mode is also supported to run
 | 
						|
DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3,
 | 
						|
and SSE4 as well as x86-64 SVM.
 | 
						|
 | 
						|
@item Support of host page sizes bigger than 4KB in user mode emulation.
 | 
						|
 | 
						|
@item QEMU can emulate itself on x86.
 | 
						|
 | 
						|
@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}.
 | 
						|
It can be used to test other x86 virtual CPUs.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
Current QEMU limitations:
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item Limited x86-64 support.
 | 
						|
 | 
						|
@item IPC syscalls are missing.
 | 
						|
 | 
						|
@item The x86 segment limits and access rights are not tested at every
 | 
						|
memory access (yet). Hopefully, very few OSes seem to rely on that for
 | 
						|
normal use.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
@node ARM
 | 
						|
@subsection ARM emulation
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item Full ARM 7 user emulation.
 | 
						|
 | 
						|
@item NWFPE FPU support included in user Linux emulation.
 | 
						|
 | 
						|
@item Can run most ARM Linux binaries.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
@node MIPS
 | 
						|
@subsection MIPS emulation
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item The system emulation allows full MIPS32/MIPS64 Release 2 emulation,
 | 
						|
including privileged instructions, FPU and MMU, in both little and big
 | 
						|
endian modes.
 | 
						|
 | 
						|
@item The Linux userland emulation can run many 32 bit MIPS Linux binaries.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
Current QEMU limitations:
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item Self-modifying code is not always handled correctly.
 | 
						|
 | 
						|
@item 64 bit userland emulation is not implemented.
 | 
						|
 | 
						|
@item The system emulation is not complete enough to run real firmware.
 | 
						|
 | 
						|
@item The watchpoint debug facility is not implemented.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
@node PPC
 | 
						|
@subsection PowerPC emulation
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item Full PowerPC 32 bit emulation, including privileged instructions,
 | 
						|
FPU and MMU.
 | 
						|
 | 
						|
@item Can run most PowerPC Linux binaries.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
@node SPARC
 | 
						|
@subsection Sparc32 and Sparc64 emulation
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item Full SPARC V8 emulation, including privileged
 | 
						|
instructions, FPU and MMU. SPARC V9 emulation includes most privileged
 | 
						|
and VIS instructions, FPU and I/D MMU. Alignment is fully enforced.
 | 
						|
 | 
						|
@item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and
 | 
						|
some 64-bit SPARC Linux binaries.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
Current QEMU limitations:
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item IPC syscalls are missing.
 | 
						|
 | 
						|
@item Floating point exception support is buggy.
 | 
						|
 | 
						|
@item Atomic instructions are not correctly implemented.
 | 
						|
 | 
						|
@item There are still some problems with Sparc64 emulators.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
@node Xtensa
 | 
						|
@subsection Xtensa emulation
 | 
						|
 | 
						|
@itemize
 | 
						|
 | 
						|
@item Core Xtensa ISA emulation, including most options: code density,
 | 
						|
loop, extended L32R, 16- and 32-bit multiplication, 32-bit division,
 | 
						|
MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor
 | 
						|
context, debug, multiprocessor synchronization,
 | 
						|
conditional store, exceptions, relocatable vectors, unaligned exception,
 | 
						|
interrupts (including high priority and timer), hardware alignment,
 | 
						|
region protection, region translation, MMU, windowed registers, thread
 | 
						|
pointer, processor ID.
 | 
						|
 | 
						|
@item Not implemented options: data/instruction cache (including cache
 | 
						|
prefetch and locking), XLMI, processor interface. Also options not
 | 
						|
covered by the core ISA (e.g. FLIX, wide branches) are not implemented.
 | 
						|
 | 
						|
@item Can run most Xtensa Linux binaries.
 | 
						|
 | 
						|
@item New core configuration that requires no additional instructions
 | 
						|
may be created from overlay with minimal amount of hand-written code.
 | 
						|
 | 
						|
@end itemize
 | 
						|
 | 
						|
@node Translator Internals
 | 
						|
@section Translator Internals
 | 
						|
 | 
						|
QEMU is a dynamic translator. When it first encounters a piece of code,
 | 
						|
it converts it to the host instruction set. Usually dynamic translators
 | 
						|
are very complicated and highly CPU dependent. QEMU uses some tricks
 | 
						|
which make it relatively easily portable and simple while achieving good
 | 
						|
performances.
 | 
						|
 | 
						|
QEMU's dynamic translation backend is called TCG, for "Tiny Code
 | 
						|
Generator". For more information, please take a look at @code{tcg/README}.
 | 
						|
 | 
						|
Some notable features of QEMU's dynamic translator are:
 | 
						|
 | 
						|
@table @strong
 | 
						|
 | 
						|
@item CPU state optimisations:
 | 
						|
The target CPUs have many internal states which change the way it
 | 
						|
evaluates instructions. In order to achieve a good speed, the
 | 
						|
translation phase considers that some state information of the virtual
 | 
						|
CPU cannot change in it. The state is recorded in the Translation
 | 
						|
Block (TB). If the state changes (e.g. privilege level), a new TB will
 | 
						|
be generated and the previous TB won't be used anymore until the state
 | 
						|
matches the state recorded in the previous TB. The same idea can be applied
 | 
						|
to other aspects of the CPU state.  For example, on x86, if the SS,
 | 
						|
DS and ES segments have a zero base, then the translator does not even
 | 
						|
generate an addition for the segment base.
 | 
						|
 | 
						|
@item Direct block chaining:
 | 
						|
After each translated basic block is executed, QEMU uses the simulated
 | 
						|
Program Counter (PC) and other cpu state information (such as the CS
 | 
						|
segment base value) to find the next basic block.
 | 
						|
 | 
						|
In order to accelerate the most common cases where the new simulated PC
 | 
						|
is known, QEMU can patch a basic block so that it jumps directly to the
 | 
						|
next one.
 | 
						|
 | 
						|
The most portable code uses an indirect jump. An indirect jump makes
 | 
						|
it easier to make the jump target modification atomic. On some host
 | 
						|
architectures (such as x86 or PowerPC), the @code{JUMP} opcode is
 | 
						|
directly patched so that the block chaining has no overhead.
 | 
						|
 | 
						|
@item Self-modifying code and translated code invalidation:
 | 
						|
Self-modifying code is a special challenge in x86 emulation because no
 | 
						|
instruction cache invalidation is signaled by the application when code
 | 
						|
is modified.
 | 
						|
 | 
						|
User-mode emulation marks a host page as write-protected (if it is
 | 
						|
not already read-only) every time translated code is generated for a
 | 
						|
basic block.  Then, if a write access is done to the page, Linux raises
 | 
						|
a SEGV signal. QEMU then invalidates all the translated code in the page
 | 
						|
and enables write accesses to the page.  For system emulation, write
 | 
						|
protection is achieved through the software MMU.
 | 
						|
 | 
						|
Correct translated code invalidation is done efficiently by maintaining
 | 
						|
a linked list of every translated block contained in a given page. Other
 | 
						|
linked lists are also maintained to undo direct block chaining.
 | 
						|
 | 
						|
On RISC targets, correctly written software uses memory barriers and
 | 
						|
cache flushes, so some of the protection above would not be
 | 
						|
necessary. However, QEMU still requires that the generated code always
 | 
						|
matches the target instructions in memory in order to handle
 | 
						|
exceptions correctly.
 | 
						|
 | 
						|
@item Exception support:
 | 
						|
longjmp() is used when an exception such as division by zero is
 | 
						|
encountered.
 | 
						|
 | 
						|
The host SIGSEGV and SIGBUS signal handlers are used to get invalid
 | 
						|
memory accesses.  QEMU keeps a map from host program counter to
 | 
						|
target program counter, and looks up where the exception happened
 | 
						|
based on the host program counter at the exception point.
 | 
						|
 | 
						|
On some targets, some bits of the virtual CPU's state are not flushed to the
 | 
						|
memory until the end of the translation block.  This is done for internal
 | 
						|
emulation state that is rarely accessed directly by the program and/or changes
 | 
						|
very often throughout the execution of a translation block---this includes
 | 
						|
condition codes on x86, delay slots on SPARC, conditional execution on
 | 
						|
ARM, and so on.  This state is stored for each target instruction, and
 | 
						|
looked up on exceptions.
 | 
						|
 | 
						|
@item MMU emulation:
 | 
						|
For system emulation QEMU uses a software MMU. In that mode, the MMU
 | 
						|
virtual to physical address translation is done at every memory
 | 
						|
access.
 | 
						|
 | 
						|
QEMU uses an address translation cache (TLB) to speed up the translation.
 | 
						|
In order to avoid flushing the translated code each time the MMU
 | 
						|
mappings change, all caches in QEMU are physically indexed.  This
 | 
						|
means that each basic block is indexed with its physical address.
 | 
						|
 | 
						|
In order to avoid invalidating the basic block chain when MMU mappings
 | 
						|
change, chaining is only performed when the destination of the jump
 | 
						|
shares a page with the basic block that is performing the jump.
 | 
						|
 | 
						|
The MMU can also distinguish RAM and ROM memory areas from MMIO memory
 | 
						|
areas.  Access is faster for RAM and ROM because the translation cache also
 | 
						|
hosts the offset between guest address and host memory.  Accessing MMIO
 | 
						|
memory areas instead calls out to C code for device emulation.
 | 
						|
Finally, the MMU helps tracking dirty pages and pages pointed to by
 | 
						|
translation blocks.
 | 
						|
@end table
 | 
						|
 | 
						|
@node QEMU compared to other emulators
 | 
						|
@section QEMU compared to other emulators
 | 
						|
 | 
						|
Like bochs [1], QEMU emulates an x86 CPU. But QEMU is much faster than
 | 
						|
bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC
 | 
						|
emulation while QEMU can emulate several processors.
 | 
						|
 | 
						|
Like Valgrind [2], QEMU does user space emulation and dynamic
 | 
						|
translation. Valgrind is mainly a memory debugger while QEMU has no
 | 
						|
support for it (QEMU could be used to detect out of bound memory
 | 
						|
accesses as Valgrind, but it has no support to track uninitialised data
 | 
						|
as Valgrind does). The Valgrind dynamic translator generates better code
 | 
						|
than QEMU (in particular it does register allocation) but it is closely
 | 
						|
tied to an x86 host and target and has no support for precise exceptions
 | 
						|
and system emulation.
 | 
						|
 | 
						|
EM86 [3] is the closest project to user space QEMU (and QEMU still uses
 | 
						|
some of its code, in particular the ELF file loader). EM86 was limited
 | 
						|
to an alpha host and used a proprietary and slow interpreter (the
 | 
						|
interpreter part of the FX!32 Digital Win32 code translator [4]).
 | 
						|
 | 
						|
TWIN from Willows Software was a Windows API emulator like Wine. It is less
 | 
						|
accurate than Wine but includes a protected mode x86 interpreter to launch
 | 
						|
x86 Windows executables. Such an approach has greater potential because most
 | 
						|
of the Windows API is executed natively but it is far more difficult to
 | 
						|
develop because all the data structures and function parameters exchanged
 | 
						|
between the API and the x86 code must be converted.
 | 
						|
 | 
						|
User mode Linux [5] was the only solution before QEMU to launch a
 | 
						|
Linux kernel as a process while not needing any host kernel
 | 
						|
patches. However, user mode Linux requires heavy kernel patches while
 | 
						|
QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is
 | 
						|
slower.
 | 
						|
 | 
						|
The Plex86 [6] PC virtualizer is done in the same spirit as the now
 | 
						|
obsolete qemu-fast system emulator. It requires a patched Linux kernel
 | 
						|
to work (you cannot launch the same kernel on your PC), but the
 | 
						|
patches are really small. As it is a PC virtualizer (no emulation is
 | 
						|
done except for some privileged instructions), it has the potential of
 | 
						|
being faster than QEMU. The downside is that a complicated (and
 | 
						|
potentially unsafe) host kernel patch is needed.
 | 
						|
 | 
						|
The commercial PC Virtualizers (VMWare [7], VirtualPC [8]) are faster
 | 
						|
than QEMU (without virtualization), but they all need specific, proprietary
 | 
						|
and potentially unsafe host drivers. Moreover, they are unable to
 | 
						|
provide cycle exact simulation as an emulator can.
 | 
						|
 | 
						|
VirtualBox [9], Xen [10] and KVM [11] are based on QEMU. QEMU-SystemC
 | 
						|
[12] uses QEMU to simulate a system where some hardware devices are
 | 
						|
developed in SystemC.
 | 
						|
 | 
						|
@node Bibliography
 | 
						|
@section Bibliography
 | 
						|
 | 
						|
@table @asis
 | 
						|
 | 
						|
@item [1]
 | 
						|
@url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project,
 | 
						|
by Kevin Lawton et al.
 | 
						|
 | 
						|
@item [2]
 | 
						|
@url{http://www.valgrind.org/}, Valgrind, an open-source memory debugger
 | 
						|
for GNU/Linux.
 | 
						|
 | 
						|
@item [3]
 | 
						|
@url{http://ftp.dreamtime.org/pub/linux/Linux-Alpha/em86/v0.2/docs/em86.html},
 | 
						|
the EM86 x86 emulator on Alpha-Linux.
 | 
						|
 | 
						|
@item [4]
 | 
						|
@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf},
 | 
						|
DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
 | 
						|
Chernoff and Ray Hookway.
 | 
						|
 | 
						|
@item [5]
 | 
						|
@url{http://user-mode-linux.sourceforge.net/},
 | 
						|
The User-mode Linux Kernel.
 | 
						|
 | 
						|
@item [6]
 | 
						|
@url{http://www.plex86.org/},
 | 
						|
The new Plex86 project.
 | 
						|
 | 
						|
@item [7]
 | 
						|
@url{http://www.vmware.com/},
 | 
						|
The VMWare PC virtualizer.
 | 
						|
 | 
						|
@item [8]
 | 
						|
@url{https://www.microsoft.com/download/details.aspx?id=3702},
 | 
						|
The VirtualPC PC virtualizer.
 | 
						|
 | 
						|
@item [9]
 | 
						|
@url{http://virtualbox.org/},
 | 
						|
The VirtualBox PC virtualizer.
 | 
						|
 | 
						|
@item [10]
 | 
						|
@url{http://www.xen.org/},
 | 
						|
The Xen hypervisor.
 | 
						|
 | 
						|
@item [11]
 | 
						|
@url{http://www.linux-kvm.org/},
 | 
						|
Kernel Based Virtual Machine (KVM).
 | 
						|
 | 
						|
@item [12]
 | 
						|
@url{http://www.greensocs.com/projects/QEMUSystemC},
 | 
						|
QEMU-SystemC, a hardware co-simulator.
 | 
						|
 | 
						|
@end table
 |