llvm-for-llvmta/lib/Target/X86/README-FPStack.txt

//===---------------------------------------------------------------------===//
// Random ideas for the X86 backend: FP stack related stuff
//===---------------------------------------------------------------------===//

//===---------------------------------------------------------------------===//

Some targets (e.g. athlons) prefer freep to fstp ST(0):
http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html

//===---------------------------------------------------------------------===//

This should use fiadd on chips where it is profitable:
double foo(double P, int *I) { return P+*I; }

We have fiadd patterns now but the followings have the same cost and
complexity. We need a way to specify the later is more profitable.

def FpADD32m  : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW,
                    [(set RFP:$dst, (fadd RFP:$src1,
                                     (extloadf64f32 addr:$src2)))]>;
                // ST(0) = ST(0) + [mem32]

def FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW,
                    [(set RFP:$dst, (fadd RFP:$src1,
                                     (X86fild addr:$src2, i32)))]>;
                // ST(0) = ST(0) + [mem32int]

//===---------------------------------------------------------------------===//

The FP stackifier should handle simple permutates to reduce number of shuffle
instructions, e.g. turning:

fld P	->		fld Q
fld Q			fld P
fxch

or:

fxch	->		fucomi
fucomi			jl X
jg X

Ideas:
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html


//===---------------------------------------------------------------------===//

Add a target specific hook to DAG combiner to handle SINT_TO_FP and
FP_TO_SINT when the source operand is already in memory.

//===---------------------------------------------------------------------===//

Open code rint,floor,ceil,trunc:
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html

Opencode the sincos[f] libcall.

//===---------------------------------------------------------------------===//

None of the FPStack instructions are handled in
X86RegisterInfo::foldMemoryOperand, which prevents the spiller from
folding spill code into the instructions.

//===---------------------------------------------------------------------===//

Currently the x86 codegen isn't very good at mixing SSE and FPStack
code:

unsigned int foo(double x) { return x; }

foo:
	subl $20, %esp
	movsd 24(%esp), %xmm0
	movsd %xmm0, 8(%esp)
	fldl 8(%esp)
	fisttpll (%esp)
	movl (%esp), %eax
	addl $20, %esp
	ret

This just requires being smarter when custom expanding fptoui.

//===---------------------------------------------------------------------===//
first commit 2022-04-25 10:02:23 +02:00			`//===---------------------------------------------------------------------===//`
			`// Random ideas for the X86 backend: FP stack related stuff`
			`//===---------------------------------------------------------------------===//`

			`//===---------------------------------------------------------------------===//`

			`Some targets (e.g. athlons) prefer freep to fstp ST(0):`
			`http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html`

			`//===---------------------------------------------------------------------===//`

			`This should use fiadd on chips where it is profitable:`
			`double foo(double P, int I) { return P+I; }`

			`We have fiadd patterns now but the followings have the same cost and`
			`complexity. We need a way to specify the later is more profitable.`

			`def FpADD32m : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW,`
			`[(set RFP:$dst, (fadd RFP:$src1,`
			`(extloadf64f32 addr:$src2)))]>;`
			`// ST(0) = ST(0) + [mem32]`

			`def FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW,`
			`[(set RFP:$dst, (fadd RFP:$src1,`
			`(X86fild addr:$src2, i32)))]>;`
			`// ST(0) = ST(0) + [mem32int]`

			`//===---------------------------------------------------------------------===//`

			`The FP stackifier should handle simple permutates to reduce number of shuffle`
			`instructions, e.g. turning:`

			`fld P -> fld Q`
			`fld Q fld P`
			`fxch`

			`or:`

			`fxch -> fucomi`
			`fucomi jl X`
			`jg X`

			`Ideas:`
			`http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html`


			`//===---------------------------------------------------------------------===//`

			`Add a target specific hook to DAG combiner to handle SINT_TO_FP and`
			`FP_TO_SINT when the source operand is already in memory.`

			`//===---------------------------------------------------------------------===//`

			`Open code rint,floor,ceil,trunc:`
			`http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html`
			`http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html`

			`Opencode the sincos[f] libcall.`

			`//===---------------------------------------------------------------------===//`

			`None of the FPStack instructions are handled in`
			`X86RegisterInfo::foldMemoryOperand, which prevents the spiller from`
			`folding spill code into the instructions.`

			`//===---------------------------------------------------------------------===//`

			`Currently the x86 codegen isn't very good at mixing SSE and FPStack`
			`code:`

			`unsigned int foo(double x) { return x; }`

			`foo:`
			`subl $20, %esp`
			`movsd 24(%esp), %xmm0`
			`movsd %xmm0, 8(%esp)`
			`fldl 8(%esp)`
			`fisttpll (%esp)`
			`movl (%esp), %eax`
			`addl $20, %esp`
			`ret`

			`This just requires being smarter when custom expanding fptoui.`

			`//===---------------------------------------------------------------------===//`