Rather than allow arbitrary shift+trunc, only concern ourselves with low and high parts. This is all that was being used anyway. Signed-off-by: Richard Henderson <rth@twiddle.net>
		
			
				
	
	
		
			570 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			570 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
Tiny Code Generator - Fabrice Bellard.
 | 
						|
 | 
						|
1) Introduction
 | 
						|
 | 
						|
TCG (Tiny Code Generator) began as a generic backend for a C
 | 
						|
compiler. It was simplified to be used in QEMU. It also has its roots
 | 
						|
in the QOP code generator written by Paul Brook. 
 | 
						|
 | 
						|
2) Definitions
 | 
						|
 | 
						|
The TCG "target" is the architecture for which we generate the
 | 
						|
code. It is of course not the same as the "target" of QEMU which is
 | 
						|
the emulated architecture. As TCG started as a generic C backend used
 | 
						|
for cross compiling, it is assumed that the TCG target is different
 | 
						|
from the host, although it is never the case for QEMU.
 | 
						|
 | 
						|
In this document, we use "guest" to specify what architecture we are
 | 
						|
emulating; "target" always means the TCG target, the machine on which
 | 
						|
we are running QEMU.
 | 
						|
 | 
						|
A TCG "function" corresponds to a QEMU Translated Block (TB).
 | 
						|
 | 
						|
A TCG "temporary" is a variable only live in a basic
 | 
						|
block. Temporaries are allocated explicitly in each function.
 | 
						|
 | 
						|
A TCG "local temporary" is a variable only live in a function. Local
 | 
						|
temporaries are allocated explicitly in each function.
 | 
						|
 | 
						|
A TCG "global" is a variable which is live in all the functions
 | 
						|
(equivalent of a C global variable). They are defined before the
 | 
						|
functions defined. A TCG global can be a memory location (e.g. a QEMU
 | 
						|
CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
 | 
						|
or a memory location which is stored in a register outside QEMU TBs
 | 
						|
(not implemented yet).
 | 
						|
 | 
						|
A TCG "basic block" corresponds to a list of instructions terminated
 | 
						|
by a branch instruction. 
 | 
						|
 | 
						|
An operation with "undefined behavior" may result in a crash.
 | 
						|
 | 
						|
An operation with "unspecified behavior" shall not crash.  However,
 | 
						|
the result may be one of several possibilities so may be considered
 | 
						|
an "undefined result".
 | 
						|
 | 
						|
3) Intermediate representation
 | 
						|
 | 
						|
3.1) Introduction
 | 
						|
 | 
						|
TCG instructions operate on variables which are temporaries, local
 | 
						|
temporaries or globals. TCG instructions and variables are strongly
 | 
						|
typed. Two types are supported: 32 bit integers and 64 bit
 | 
						|
integers. Pointers are defined as an alias to 32 bit or 64 bit
 | 
						|
integers depending on the TCG target word size.
 | 
						|
 | 
						|
Each instruction has a fixed number of output variable operands, input
 | 
						|
variable operands and always constant operands.
 | 
						|
 | 
						|
The notable exception is the call instruction which has a variable
 | 
						|
number of outputs and inputs.
 | 
						|
 | 
						|
In the textual form, output operands usually come first, followed by
 | 
						|
input operands, followed by constant operands. The output type is
 | 
						|
included in the instruction name. Constants are prefixed with a '$'.
 | 
						|
 | 
						|
add_i32 t0, t1, t2  (t0 <- t1 + t2)
 | 
						|
 | 
						|
3.2) Assumptions
 | 
						|
 | 
						|
* Basic blocks
 | 
						|
 | 
						|
- Basic blocks end after branches (e.g. brcond_i32 instruction),
 | 
						|
  goto_tb and exit_tb instructions.
 | 
						|
- Basic blocks start after the end of a previous basic block, or at a
 | 
						|
  set_label instruction.
 | 
						|
 | 
						|
After the end of a basic block, the content of temporaries is
 | 
						|
destroyed, but local temporaries and globals are preserved.
 | 
						|
 | 
						|
* Floating point types are not supported yet
 | 
						|
 | 
						|
* Pointers: depending on the TCG target, pointer size is 32 bit or 64
 | 
						|
  bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
 | 
						|
  TCG_TYPE_I64.
 | 
						|
 | 
						|
* Helpers:
 | 
						|
 | 
						|
Using the tcg_gen_helper_x_y it is possible to call any function
 | 
						|
taking i32, i64 or pointer types. By default, before calling a helper,
 | 
						|
all globals are stored at their canonical location and it is assumed
 | 
						|
that the function can modify them. By default, the helper is allowed to
 | 
						|
modify the CPU state or raise an exception.
 | 
						|
 | 
						|
This can be overridden using the following function modifiers:
 | 
						|
- TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals,
 | 
						|
  either directly or via an exception. They will not be saved to their
 | 
						|
  canonical locations before calling the helper.
 | 
						|
- TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any globals.
 | 
						|
  They will only be saved to their canonical location before calling helpers,
 | 
						|
  but they won't be reloaded afterwise.
 | 
						|
- TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is removed if
 | 
						|
  the return value is not used.
 | 
						|
 | 
						|
Note that TCG_CALL_NO_READ_GLOBALS implies TCG_CALL_NO_WRITE_GLOBALS.
 | 
						|
 | 
						|
On some TCG targets (e.g. x86), several calling conventions are
 | 
						|
supported.
 | 
						|
 | 
						|
* Branches:
 | 
						|
 | 
						|
Use the instruction 'br' to jump to a label.
 | 
						|
 | 
						|
3.3) Code Optimizations
 | 
						|
 | 
						|
When generating instructions, you can count on at least the following
 | 
						|
optimizations:
 | 
						|
 | 
						|
- Single instructions are simplified, e.g.
 | 
						|
 | 
						|
   and_i32 t0, t0, $0xffffffff
 | 
						|
    
 | 
						|
  is suppressed.
 | 
						|
 | 
						|
- A liveness analysis is done at the basic block level. The
 | 
						|
  information is used to suppress moves from a dead variable to
 | 
						|
  another one. It is also used to remove instructions which compute
 | 
						|
  dead results. The later is especially useful for condition code
 | 
						|
  optimization in QEMU.
 | 
						|
 | 
						|
  In the following example:
 | 
						|
 | 
						|
  add_i32 t0, t1, t2
 | 
						|
  add_i32 t0, t0, $1
 | 
						|
  mov_i32 t0, $1
 | 
						|
 | 
						|
  only the last instruction is kept.
 | 
						|
 | 
						|
3.4) Instruction Reference
 | 
						|
 | 
						|
********* Function call
 | 
						|
 | 
						|
* call <ret> <params> ptr
 | 
						|
 | 
						|
call function 'ptr' (pointer type)
 | 
						|
 | 
						|
<ret> optional 32 bit or 64 bit return value
 | 
						|
<params> optional 32 bit or 64 bit parameters
 | 
						|
 | 
						|
********* Jumps/Labels
 | 
						|
 | 
						|
* set_label $label
 | 
						|
 | 
						|
Define label 'label' at the current program point.
 | 
						|
 | 
						|
* br $label
 | 
						|
 | 
						|
Jump to label.
 | 
						|
 | 
						|
* brcond_i32/i64 t0, t1, cond, label
 | 
						|
 | 
						|
Conditional jump if t0 cond t1 is true. cond can be:
 | 
						|
    TCG_COND_EQ
 | 
						|
    TCG_COND_NE
 | 
						|
    TCG_COND_LT /* signed */
 | 
						|
    TCG_COND_GE /* signed */
 | 
						|
    TCG_COND_LE /* signed */
 | 
						|
    TCG_COND_GT /* signed */
 | 
						|
    TCG_COND_LTU /* unsigned */
 | 
						|
    TCG_COND_GEU /* unsigned */
 | 
						|
    TCG_COND_LEU /* unsigned */
 | 
						|
    TCG_COND_GTU /* unsigned */
 | 
						|
 | 
						|
********* Arithmetic
 | 
						|
 | 
						|
* add_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1+t2
 | 
						|
 | 
						|
* sub_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1-t2
 | 
						|
 | 
						|
* neg_i32/i64 t0, t1
 | 
						|
 | 
						|
t0=-t1 (two's complement)
 | 
						|
 | 
						|
* mul_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1*t2
 | 
						|
 | 
						|
* div_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
 | 
						|
 | 
						|
* divu_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1/t2 (unsigned). Undefined behavior if division by zero.
 | 
						|
 | 
						|
* rem_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
 | 
						|
 | 
						|
* remu_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1%t2 (unsigned). Undefined behavior if division by zero.
 | 
						|
 | 
						|
********* Logical
 | 
						|
 | 
						|
* and_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1&t2
 | 
						|
 | 
						|
* or_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1|t2
 | 
						|
 | 
						|
* xor_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1^t2
 | 
						|
 | 
						|
* not_i32/i64 t0, t1
 | 
						|
 | 
						|
t0=~t1
 | 
						|
 | 
						|
* andc_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1&~t2
 | 
						|
 | 
						|
* eqv_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=~(t1^t2), or equivalently, t0=t1^~t2
 | 
						|
 | 
						|
* nand_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=~(t1&t2)
 | 
						|
 | 
						|
* nor_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=~(t1|t2)
 | 
						|
 | 
						|
* orc_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1|~t2
 | 
						|
 | 
						|
********* Shifts/Rotates
 | 
						|
 | 
						|
* shl_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1 << t2. Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
 | 
						|
 | 
						|
* shr_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1 >> t2 (unsigned). Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
 | 
						|
 | 
						|
* sar_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
t0=t1 >> t2 (signed). Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
 | 
						|
 | 
						|
* rotl_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
Rotation of t2 bits to the left.
 | 
						|
Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
 | 
						|
 | 
						|
* rotr_i32/i64 t0, t1, t2
 | 
						|
 | 
						|
Rotation of t2 bits to the right.
 | 
						|
Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
 | 
						|
 | 
						|
********* Misc
 | 
						|
 | 
						|
* mov_i32/i64 t0, t1
 | 
						|
 | 
						|
t0 = t1
 | 
						|
 | 
						|
Move t1 to t0 (both operands must have the same type).
 | 
						|
 | 
						|
* ext8s_i32/i64 t0, t1
 | 
						|
ext8u_i32/i64 t0, t1
 | 
						|
ext16s_i32/i64 t0, t1
 | 
						|
ext16u_i32/i64 t0, t1
 | 
						|
ext32s_i64 t0, t1
 | 
						|
ext32u_i64 t0, t1
 | 
						|
 | 
						|
8, 16 or 32 bit sign/zero extension (both operands must have the same type)
 | 
						|
 | 
						|
* bswap16_i32/i64 t0, t1
 | 
						|
 | 
						|
16 bit byte swap on a 32/64 bit value. It assumes that the two/six high order
 | 
						|
bytes are set to zero.
 | 
						|
 | 
						|
* bswap32_i32/i64 t0, t1
 | 
						|
 | 
						|
32 bit byte swap on a 32/64 bit value. With a 64 bit value, it assumes that
 | 
						|
the four high order bytes are set to zero.
 | 
						|
 | 
						|
* bswap64_i64 t0, t1
 | 
						|
 | 
						|
64 bit byte swap
 | 
						|
 | 
						|
* discard_i32/i64 t0
 | 
						|
 | 
						|
Indicate that the value of t0 won't be used later. It is useful to
 | 
						|
force dead code elimination.
 | 
						|
 | 
						|
* deposit_i32/i64 dest, t1, t2, pos, len
 | 
						|
 | 
						|
Deposit T2 as a bitfield into T1, placing the result in DEST.
 | 
						|
The bitfield is described by POS/LEN, which are immediate values:
 | 
						|
 | 
						|
  LEN - the length of the bitfield
 | 
						|
  POS - the position of the first bit, counting from the LSB
 | 
						|
 | 
						|
For example, pos=8, len=4 indicates a 4-bit field at bit 8.
 | 
						|
This operation would be equivalent to
 | 
						|
 | 
						|
  dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 | 
						|
 | 
						|
* extrl_i64_i32 t0, t1
 | 
						|
 | 
						|
For 64-bit hosts only, extract the low 32-bits of input T1 and place it
 | 
						|
into 32-bit output T0.  Depending on the host, this may be a simple move,
 | 
						|
or may require additional canonicalization.
 | 
						|
 | 
						|
* extrh_i64_i32 t0, t1
 | 
						|
 | 
						|
For 64-bit hosts only, extract the high 32-bits of input T1 and place it
 | 
						|
into 32-bit output T0.  Depending on the host, this may be a simple shift,
 | 
						|
or may require additional canonicalization.
 | 
						|
 | 
						|
********* Conditional moves
 | 
						|
 | 
						|
* setcond_i32/i64 dest, t1, t2, cond
 | 
						|
 | 
						|
dest = (t1 cond t2)
 | 
						|
 | 
						|
Set DEST to 1 if (T1 cond T2) is true, otherwise set to 0.
 | 
						|
 | 
						|
* movcond_i32/i64 dest, c1, c2, v1, v2, cond
 | 
						|
 | 
						|
dest = (c1 cond c2 ? v1 : v2)
 | 
						|
 | 
						|
Set DEST to V1 if (C1 cond C2) is true, otherwise set to V2.
 | 
						|
 | 
						|
********* Type conversions
 | 
						|
 | 
						|
* ext_i32_i64 t0, t1
 | 
						|
Convert t1 (32 bit) to t0 (64 bit) and does sign extension
 | 
						|
 | 
						|
* extu_i32_i64 t0, t1
 | 
						|
Convert t1 (32 bit) to t0 (64 bit) and does zero extension
 | 
						|
 | 
						|
* trunc_i64_i32 t0, t1
 | 
						|
Truncate t1 (64 bit) to t0 (32 bit)
 | 
						|
 | 
						|
* concat_i32_i64 t0, t1, t2
 | 
						|
Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half
 | 
						|
from t2 (32 bit).
 | 
						|
 | 
						|
* concat32_i64 t0, t1, t2
 | 
						|
Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half
 | 
						|
from t2 (64 bit).
 | 
						|
 | 
						|
********* Load/Store
 | 
						|
 | 
						|
* ld_i32/i64 t0, t1, offset
 | 
						|
ld8s_i32/i64 t0, t1, offset
 | 
						|
ld8u_i32/i64 t0, t1, offset
 | 
						|
ld16s_i32/i64 t0, t1, offset
 | 
						|
ld16u_i32/i64 t0, t1, offset
 | 
						|
ld32s_i64 t0, t1, offset
 | 
						|
ld32u_i64 t0, t1, offset
 | 
						|
 | 
						|
t0 = read(t1 + offset)
 | 
						|
Load 8, 16, 32 or 64 bits with or without sign extension from host memory. 
 | 
						|
offset must be a constant.
 | 
						|
 | 
						|
* st_i32/i64 t0, t1, offset
 | 
						|
st8_i32/i64 t0, t1, offset
 | 
						|
st16_i32/i64 t0, t1, offset
 | 
						|
st32_i64 t0, t1, offset
 | 
						|
 | 
						|
write(t0, t1 + offset)
 | 
						|
Write 8, 16, 32 or 64 bits to host memory.
 | 
						|
 | 
						|
All this opcodes assume that the pointed host memory doesn't correspond
 | 
						|
to a global. In the latter case the behaviour is unpredictable.
 | 
						|
 | 
						|
********* Multiword arithmetic support
 | 
						|
 | 
						|
* add2_i32/i64 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
 | 
						|
* sub2_i32/i64 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
 | 
						|
 | 
						|
Similar to add/sub, except that the double-word inputs T1 and T2 are
 | 
						|
formed from two single-word arguments, and the double-word output T0
 | 
						|
is returned in two single-word outputs.
 | 
						|
 | 
						|
* mulu2_i32/i64 t0_low, t0_high, t1, t2
 | 
						|
 | 
						|
Similar to mul, except two unsigned inputs T1 and T2 yielding the full
 | 
						|
double-word product T0.  The later is returned in two single-word outputs.
 | 
						|
 | 
						|
* muls2_i32/i64 t0_low, t0_high, t1, t2
 | 
						|
 | 
						|
Similar to mulu2, except the two inputs T1 and T2 are signed.
 | 
						|
 | 
						|
********* 64-bit guest on 32-bit host support
 | 
						|
 | 
						|
The following opcodes are internal to TCG.  Thus they are to be implemented by
 | 
						|
32-bit host code generators, but are not to be emitted by guest translators.
 | 
						|
They are emitted as needed by inline functions within "tcg-op.h".
 | 
						|
 | 
						|
* brcond2_i32 t0_low, t0_high, t1_low, t1_high, cond, label
 | 
						|
 | 
						|
Similar to brcond, except that the 64-bit values T0 and T1
 | 
						|
are formed from two 32-bit arguments.
 | 
						|
 | 
						|
* setcond2_i32 dest, t1_low, t1_high, t2_low, t2_high, cond
 | 
						|
 | 
						|
Similar to setcond, except that the 64-bit values T1 and T2 are
 | 
						|
formed from two 32-bit arguments.  The result is a 32-bit value.
 | 
						|
 | 
						|
********* QEMU specific operations
 | 
						|
 | 
						|
* exit_tb t0
 | 
						|
 | 
						|
Exit the current TB and return the value t0 (word type).
 | 
						|
 | 
						|
* goto_tb index
 | 
						|
 | 
						|
Exit the current TB and jump to the TB index 'index' (constant) if the
 | 
						|
current TB was linked to this TB. Otherwise execute the next
 | 
						|
instructions. Only indices 0 and 1 are valid and tcg_gen_goto_tb may be issued
 | 
						|
at most once with each slot index per TB.
 | 
						|
 | 
						|
* qemu_ld_i32/i64 t0, t1, flags, memidx
 | 
						|
* qemu_st_i32/i64 t0, t1, flags, memidx
 | 
						|
 | 
						|
Load data at the guest address t1 into t0, or store data in t0 at guest
 | 
						|
address t1.  The _i32/_i64 size applies to the size of the input/output
 | 
						|
register t0 only.  The address t1 is always sized according to the guest,
 | 
						|
and the width of the memory operation is controlled by flags.
 | 
						|
 | 
						|
Both t0 and t1 may be split into little-endian ordered pairs of registers
 | 
						|
if dealing with 64-bit quantities on a 32-bit host.
 | 
						|
 | 
						|
The memidx selects the qemu tlb index to use (e.g. user or kernel access).
 | 
						|
The flags are the TCGMemOp bits, selecting the sign, width, and endianness
 | 
						|
of the memory access.
 | 
						|
 | 
						|
For a 32-bit host, qemu_ld/st_i64 is guaranteed to only be used with a
 | 
						|
64-bit memory access specified in flags.
 | 
						|
 | 
						|
*********
 | 
						|
 | 
						|
Note 1: Some shortcuts are defined when the last operand is known to be
 | 
						|
a constant (e.g. addi for add, movi for mov).
 | 
						|
 | 
						|
Note 2: When using TCG, the opcodes must never be generated directly
 | 
						|
as some of them may not be available as "real" opcodes. Always use the
 | 
						|
function tcg_gen_xxx(args).
 | 
						|
 | 
						|
4) Backend
 | 
						|
 | 
						|
tcg-target.h contains the target specific definitions. tcg-target.c
 | 
						|
contains the target specific code.
 | 
						|
 | 
						|
4.1) Assumptions
 | 
						|
 | 
						|
The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
 | 
						|
64 bit. It is expected that the pointer has the same size as the word.
 | 
						|
 | 
						|
On a 32 bit target, all 64 bit operations are converted to 32 bits. A
 | 
						|
few specific operations must be implemented to allow it (see add2_i32,
 | 
						|
sub2_i32, brcond2_i32).
 | 
						|
 | 
						|
On a 64 bit target, the values are transfered between 32 and 64-bit
 | 
						|
registers using the following ops:
 | 
						|
- trunc_shr_i64_i32
 | 
						|
- ext_i32_i64
 | 
						|
- extu_i32_i64
 | 
						|
 | 
						|
They ensure that the values are correctly truncated or extended when
 | 
						|
moved from a 32-bit to a 64-bit register or vice-versa. Note that the
 | 
						|
trunc_shr_i64_i32 is an optional op. It is not necessary to implement
 | 
						|
it if all the following conditions are met:
 | 
						|
- 64-bit registers can hold 32-bit values
 | 
						|
- 32-bit values in a 64-bit register do not need to stay zero or
 | 
						|
  sign extended
 | 
						|
- all 32-bit TCG ops ignore the high part of 64-bit registers
 | 
						|
 | 
						|
Floating point operations are not supported in this version. A
 | 
						|
previous incarnation of the code generator had full support of them,
 | 
						|
but it is better to concentrate on integer operations first.
 | 
						|
 | 
						|
4.2) Constraints
 | 
						|
 | 
						|
GCC like constraints are used to define the constraints of every
 | 
						|
instruction. Memory constraints are not supported in this
 | 
						|
version. Aliases are specified in the input operands as for GCC.
 | 
						|
 | 
						|
The same register may be used for both an input and an output, even when
 | 
						|
they are not explicitly aliased.  If an op expands to multiple target
 | 
						|
instructions then care must be taken to avoid clobbering input values.
 | 
						|
GCC style "early clobber" outputs are not currently supported.
 | 
						|
 | 
						|
A target can define specific register or constant constraints. If an
 | 
						|
operation uses a constant input constraint which does not allow all
 | 
						|
constants, it must also accept registers in order to have a fallback.
 | 
						|
 | 
						|
The movi_i32 and movi_i64 operations must accept any constants.
 | 
						|
 | 
						|
The mov_i32 and mov_i64 operations must accept any registers of the
 | 
						|
same type.
 | 
						|
 | 
						|
The ld/st instructions must accept signed 32 bit constant offsets. It
 | 
						|
can be implemented by reserving a specific register to compute the
 | 
						|
address if the offset is too big.
 | 
						|
 | 
						|
The ld/st instructions must accept any destination (ld) or source (st)
 | 
						|
register.
 | 
						|
 | 
						|
4.3) Function call assumptions
 | 
						|
 | 
						|
- The only supported types for parameters and return value are: 32 and
 | 
						|
  64 bit integers and pointer.
 | 
						|
- The stack grows downwards.
 | 
						|
- The first N parameters are passed in registers.
 | 
						|
- The next parameters are passed on the stack by storing them as words.
 | 
						|
- Some registers are clobbered during the call. 
 | 
						|
- The function can return 0 or 1 value in registers. On a 32 bit
 | 
						|
  target, functions must be able to return 2 values in registers for
 | 
						|
  64 bit return type.
 | 
						|
 | 
						|
5) Recommended coding rules for best performance
 | 
						|
 | 
						|
- Use globals to represent the parts of the QEMU CPU state which are
 | 
						|
  often modified, e.g. the integer registers and the condition
 | 
						|
  codes. TCG will be able to use host registers to store them.
 | 
						|
 | 
						|
- Avoid globals stored in fixed registers. They must be used only to
 | 
						|
  store the pointer to the CPU state and possibly to store a pointer
 | 
						|
  to a register window.
 | 
						|
 | 
						|
- Use temporaries. Use local temporaries only when really needed,
 | 
						|
  e.g. when you need to use a value after a jump. Local temporaries
 | 
						|
  introduce a performance hit in the current TCG implementation: their
 | 
						|
  content is saved to memory at end of each basic block.
 | 
						|
 | 
						|
- Free temporaries and local temporaries when they are no longer used
 | 
						|
  (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
 | 
						|
  should free it after it is used. Freeing temporaries does not yield
 | 
						|
  a better generated code, but it reduces the memory usage of TCG and
 | 
						|
  the speed of the translation.
 | 
						|
 | 
						|
- Don't hesitate to use helpers for complicated or seldom used guest
 | 
						|
  instructions. There is little performance advantage in using TCG to
 | 
						|
  implement guest instructions taking more than about twenty TCG
 | 
						|
  instructions. Note that this rule of thumb is more applicable to
 | 
						|
  helpers doing complex logic or arithmetic, where the C compiler has
 | 
						|
  scope to do a good job of optimisation; it is less relevant where
 | 
						|
  the instruction is mostly doing loads and stores, and in those cases
 | 
						|
  inline TCG may still be faster for longer sequences.
 | 
						|
 | 
						|
- The hard limit on the number of TCG instructions you can generate
 | 
						|
  per guest instruction is set by MAX_OP_PER_INSTR in exec-all.h --
 | 
						|
  you cannot exceed this without risking a buffer overrun.
 | 
						|
 | 
						|
- Use the 'discard' instruction if you know that TCG won't be able to
 | 
						|
  prove that a given global is "dead" at a given program point. The
 | 
						|
  x86 guest uses it to improve the condition codes optimisation.
 |