如何使用对齐

2018-06-03 18:22:53

我试图运行以下，

type
  Vector = array [1..4] of Single;

{$CODEALIGN 16}
function add4(const a, b: Vector): Vector; register; assembler;
asm
  movaps xmm0, [a]
  movaps xmm1, [b]
  addps xmm0, xmm1
  movaps [@result], xmm0
end;

它提供了对movaps的访问冲突，据我所知，如果内存位置是16对齐的，则可以信任这些movaps。如果movups（不需要对齐），它工作没问题。

所以我的问题是，在Delphi XE3中，{$ CODEALIGN}在这种情况下似乎不起作用。

编辑

很奇怪......我尝试了以下。

program Project3;

{$APPTYPE CONSOLE}

uses
  windows;  // if not using windows, no errors at all

type
  Vector = array [1..4] of Single;

function add4(const a, b: Vector): Vector;
asm
  movaps xmm0, [a]
  movaps xmm1, [b]
  addps xmm0, xmm1
  movaps [@result], xmm0
end;

procedure test();
var
  v1, v2: vector;
begin
  v1[1] := 1;
  v2[1] := 1;
  v1 := add4(v1,v2);  // this works
end;

var
  a, b, c: Vector;

begin
  {$ifndef cpux64}
    {$MESSAGE FATAL 'this example is for x64 target only'}
  {$else}
  test();
  c := add4(a, b); // throw out AV here
  {$endif}
end.

如果没有添加“使用窗口”，一切都很好。如果'使用窗口'，那么它将在c：= add4（a，b）处抛出异常，但不在test（）中。

谁能解释这一点？

现在编辑它对我来说是有意义的。德尔福XE3-64位的结论是

X64的堆栈帧被设置为16字节（根据需要），{$ CODEALIGN 16}将proc / fun的代码对齐到16字节。

动态数组位于堆中，可以使用SetMinimumBlockAlignment（mba16byte）将其设置为对齐16

然而，堆栈变量并不总是16字节对齐的，例如， 如果在上面的例子中，在v1，v2之前声明了一个整数变量var，例如test（），则该示例将不起作用 。

你需要你的数据是16字节对齐的。这需要一些照顾和关注。您可以确保堆分配器对齐到16个字节。但是你不能确保编译器将16字节对齐你的堆栈分配的变量，因为你的数组的对齐属性是4，它的元素的大小。并且在其他结构中声明的任何变量也将有4个字节对齐。这是一个艰难的障碍要清除。

我不认为你可以在当前可用版本的编译器中解决你的问题。至少不要，除非你放弃堆栈分配的变量，我猜那是一个可以吞咽的药片。外部汇编程序可能会带来一些好运。

您可以编写自己的内存分配例程，以在堆中分配对齐的数据。您可以指定自己的对齐大小（不仅是16个字节，还包括32个字节，64个字节等等）：

    procedure GetMemAligned(const bits: Integer; const src: Pointer;
      const SrcSize: Integer; out DstAligned, DstUnaligned: Pointer;
      out DstSize: Integer);
    var
      Bytes: NativeInt;
      i: NativeInt;
    begin
      if src <> nil then
      begin
        i := NativeInt(src);
        i := i shr bits;
        i := i shl bits;
        if i = NativeInt(src) then
        begin
          // the source is already aligned, nothing to do
          DstAligned := src;
          DstUnaligned := src;
          DstSize := SrcSize;
          Exit;
        end;
      end;
      Bytes := 1 shl bits;
      DstSize := SrcSize + Bytes;
      GetMem(DstUnaligned, DstSize);
      FillChar(DstUnaligned^, DstSize, 0);
      i := NativeInt(DstUnaligned) + Bytes;
      i := i shr bits;
      i := i shl bits;
      DstAligned := Pointer(i);
      if src <> nil then
        Move(src^, DstAligned^, SrcSize);
    end;

    procedure FreeMemAligned(const src: Pointer; var DstUnaligned: Pointer;
      var DstSize: Integer);
    begin
      if src <> DstUnaligned then
      begin
        if DstUnaligned <> nil then
          FreeMem(DstUnaligned, DstSize);
      end;
      DstUnaligned := nil;
      DstSize := 0;
    end;

然后使用指针和过程作为第三个参数来返回结果。

你也可以使用函数，但并不是那么明显。

type
  PVector^ = TVector;
  TVector  = packed array [1..4] of Single;

然后以这种方式分配这些对象：

const
   SizeAligned = SizeOf(TVector);
var
   DataUnaligned, DataAligned: Pointer;
   SizeUnaligned: Integer;
   V1: PVector;
begin
  GetMemAligned(4 {align by 4 bits, i.e. by 16 bytes}, nil, SizeAligned, DataAligned, DataUnaligned, SizeUnaligned);
  V1 := DataAligned;
  // now you can work with your vector via V1^ - it is aligned by 16 bytes and stays in the heap

  FreeMemAligned(nil, DataUnaligned, SizeUnaligned);
end;

正如你所指出的那样，我们已经通过nil至GetMemAligned和FreeMemAligned -当我们要对齐现有的数据，例如，我们已经收到作为函数参数，例如一个需要此参数。

只需在汇编程序中使用直接注册名称而不是参数名称。在使用寄存器调用转换时，不会弄乱任何东西 - 否则可能会修改寄存器而不知道使用的参数名称只是寄存器的别名。

在Win64下，使用微软调用约定，第一个参数总是作为RCX传递，第二个RDX，第三个R8，第四个R9，其余的都是堆栈。函数返回RAX中的结果。但是，如果一个函数返回一个结构（“记录”）结果，它不会在RAX中返回，而是在一个隐含的参数中按地址返回。通话后，您的功能可能会修改以下寄存器：RAX，RCX，RDX，R8，R9，R10，R11。其余的应该保留。有关更多详细信息，请参阅https://msdn.microsoft.com/en-us/library/ms235286.aspx。

在Win32下，使用Delphi注册调用约定，一个调用在EAX中传递第一个参数，在EDX中传递第二个参数，在ECX中传递第三个参数，并且在堆栈中休息

下表总结了不同之处：

         64     32
         ---   ---
    1)   rcx   eax
    2)   rdx   edx
    3)   r8    ecx
    4)   r9    stack

所以，你的函数看起来像这样（32位）：

procedure add4(const a, b: TVector; out Result: TVector); register; assembler;
asm
  movaps xmm0, [eax]
  movaps xmm1, [edx]
  addps xmm0, xmm1
  movaps [ecx], xmm0
end;

在64位下;

procedure add4(const a, b: TVector; out Result: TVector); register; assembler;
asm
  movaps xmm0, [rcx]
  movaps xmm1, [rdx]
  addps xmm0, xmm1
  movaps [r8], xmm0
end;

顺便说一下，根据Microsoft的说法，64位调用约定中的浮点参数直接传递到XMM寄存器：XMM0中的第一个，XMM1中的第二个，XMM2中的第三个，以及XMM3中的第四个，并且依次堆栈。所以你可以通过价值传递它们，而不是通过引用。

使用它可以使内置内存管理器以16字节对齐方式分配：

SetMinimumBlockAlignment(mba16Byte);

此外，据我所知，“注册”和“汇编器”都是冗余指令，所以你可以跳过你的代码。

编辑：你提到这是为x64。我只是在为X64编译的Delphi XE2中尝试了以下内容，并且在此处可用。

program Project3;

type
  Vector = array [1..4] of Single;

function add4(const a, b: Vector): Vector;
asm
  movaps xmm0, [a]
  movaps xmm1, [b]
  addps xmm0, xmm1
  movaps [@result], xmm0
end;

procedure f();
var
  v1,v2 : vector;
begin
  v1[1] := 1;
  v2[1] := 1;
  v1 := add4(v1,v2);
end;

begin
  {$ifndef cpux64}
  {$MESSAGE FATAL 'this example is for x64 target only'}
  {$else}
  f();
  {$endif}
end.

链接地址: http://www.djcxy.com/p/12613.html

上一篇: How to use align

下一篇: Create a table with column names derived from row values of another table