How to use align
I was trying to run the following,
type
Vector = array [1..4] of Single;
{$CODEALIGN 16}
function add4(const a, b: Vector): Vector; register; assembler;
asm
movaps xmm0, [a]
movaps xmm1, [b]
addps xmm0, xmm1
movaps [@result], xmm0
end;
It gives Access Violation on movaps, as far as I know, the movaps can be trusted if the memory location is 16-align. It works no problem if movups (no align is needed).
So my question is, in Delphi XE3, {$CODEALIGN} seems not working in this case.
EDIT
Very strange... I tried the following.
program Project3;
{$APPTYPE CONSOLE}
uses
windows; // if not using windows, no errors at all
type
Vector = array [1..4] of Single;
function add4(const a, b: Vector): Vector;
asm
movaps xmm0, [a]
movaps xmm1, [b]
addps xmm0, xmm1
movaps [@result], xmm0
end;
procedure test();
var
v1, v2: vector;
begin
v1[1] := 1;
v2[1] := 1;
v1 := add4(v1,v2); // this works
end;
var
a, b, c: Vector;
begin
{$ifndef cpux64}
{$MESSAGE FATAL 'this example is for x64 target only'}
{$else}
test();
c := add4(a, b); // throw out AV here
{$endif}
end.
If 'use windows' is not added, everything is fine. If 'use window', then it will throw out exception at c := add4(a, b) but not in test().
Who can explain this?
EDIT it all makes sense to me, now. the conclusions for Delphi XE3 - 64-bit are
You need your data to be 16 byte aligned. That requires some care and attention. You can make sure that the heap allocator aligns to 16 bytes. But you cannot make sure that the compiler will 16 byte align your stack allocated variables because your array has an alignment property of 4, the size of its elements. And any variables declared inside other structures will also have 4 byte alignment. Which is a tough hurdle to clear.
I don't think you can solve your problem in the currently available versions of the compiler. At least not unless you forgo stack allocated variables which I'd guess to be too bitter a pill to swallow. You might have some luck with an external assembler.
You can write your own memory allocation routines that allocate aligned data in the heap. You can specify your own alignment size (not just 16 bytes but also 32 bytes, 64 bytes and so on...):
procedure GetMemAligned(const bits: Integer; const src: Pointer;
const SrcSize: Integer; out DstAligned, DstUnaligned: Pointer;
out DstSize: Integer);
var
Bytes: NativeInt;
i: NativeInt;
begin
if src <> nil then
begin
i := NativeInt(src);
i := i shr bits;
i := i shl bits;
if i = NativeInt(src) then
begin
// the source is already aligned, nothing to do
DstAligned := src;
DstUnaligned := src;
DstSize := SrcSize;
Exit;
end;
end;
Bytes := 1 shl bits;
DstSize := SrcSize + Bytes;
GetMem(DstUnaligned, DstSize);
FillChar(DstUnaligned^, DstSize, 0);
i := NativeInt(DstUnaligned) + Bytes;
i := i shr bits;
i := i shl bits;
DstAligned := Pointer(i);
if src <> nil then
Move(src^, DstAligned^, SrcSize);
end;
procedure FreeMemAligned(const src: Pointer; var DstUnaligned: Pointer;
var DstSize: Integer);
begin
if src <> DstUnaligned then
begin
if DstUnaligned <> nil then
FreeMem(DstUnaligned, DstSize);
end;
DstUnaligned := nil;
DstSize := 0;
end;
Then use pointers and procedures as a third argument to return the result.
You can also use functions, but it is not that evident.
type
PVector^ = TVector;
TVector = packed array [1..4] of Single;
Then allocate these objects that way:
const
SizeAligned = SizeOf(TVector);
var
DataUnaligned, DataAligned: Pointer;
SizeUnaligned: Integer;
V1: PVector;
begin
GetMemAligned(4 {align by 4 bits, i.e. by 16 bytes}, nil, SizeAligned, DataAligned, DataUnaligned, SizeUnaligned);
V1 := DataAligned;
// now you can work with your vector via V1^ - it is aligned by 16 bytes and stays in the heap
FreeMemAligned(nil, DataUnaligned, SizeUnaligned);
end;
As you have pointed out, we have passed nil
to GetMemAligned and FreeMemAligned - this parameter is needed when we want to align existing data, eg one which we have received as a function argument, for example.
Just use straight register names rather than parameter names in assembly routines. You will not mess anything with that when using register calling convension - otherwise you risk to modify the registers without knowing that the parameter names used are just aliases for the registers.
Under Win64, with Microsoft calling convention, first parameter is always passed as RCX, second - RDX, third R8, fourth - R9, the rest in stack. A function returns the result in RAX. But if a function returns a structure ("record") result, it is not returned in RAX, but in an implicit argument, by address. The following registers may be modifyed by your function after the call: RAX,RCX,RDX,R8,R9,R10,R11. The rest should be preserved. See https://msdn.microsoft.com/en-us/library/ms235286.aspx for more details.
Under Win32, with Delphi register calling convention, a call passes first parameter in EAX, second in EDX, third in ECX, and rest in stack
The following table summarizes the differences:
64 32
--- ---
1) rcx eax
2) rdx edx
3) r8 ecx
4) r9 stack
So, your function will look like this (32-bit):
procedure add4(const a, b: TVector; out Result: TVector); register; assembler;
asm
movaps xmm0, [eax]
movaps xmm1, [edx]
addps xmm0, xmm1
movaps [ecx], xmm0
end;
Under 64-bit;
procedure add4(const a, b: TVector; out Result: TVector); register; assembler;
asm
movaps xmm0, [rcx]
movaps xmm1, [rdx]
addps xmm0, xmm1
movaps [r8], xmm0
end;
By the way, according to Microsoft, floating point arguments in 64-bit calling convention are passed in direct in the XMM registers: first in XMM0, second in XMM1, third in XMM2, and fourth in XMM3, and rest in stack. So you can pass them by value, not by reference.
Use this to make the built-in memory manager allocate with 16-byte alignment:
SetMinimumBlockAlignment(mba16Byte);
Also, as far as I know, both "register" and "assembler" are redundant directives so you can skip those from your code.
--
Edit: you mention this is for x64. I just tried the following in Delphi XE2 compiled for x64 and it works here.
program Project3;
type
Vector = array [1..4] of Single;
function add4(const a, b: Vector): Vector;
asm
movaps xmm0, [a]
movaps xmm1, [b]
addps xmm0, xmm1
movaps [@result], xmm0
end;
procedure f();
var
v1,v2 : vector;
begin
v1[1] := 1;
v2[1] := 1;
v1 := add4(v1,v2);
end;
begin
{$ifndef cpux64}
{$MESSAGE FATAL 'this example is for x64 target only'}
{$else}
f();
{$endif}
end.
链接地址: http://www.djcxy.com/p/12614.html
上一篇: 写多个stdin
下一篇: 如何使用对齐