优化此汇编代码

2018-06-02 22:47:51

我现在正在参加计算机体系结构课程，我们将回顾基本的R型和I型指令（也是RISC架构）等等。我似乎无法弄清楚如何优化这个码。

说明：该代码在数组中添加单词（由$ s1指向），直到达到零。结果存储在$ t1中。 $ t0包含当前单词。

        add   $t1, $zero, $zero      # Initialize result to zero
again:  
        lw    $t0, 0($s1)            # Load the word from the array
        beq   $t0, $zero, done       # Terminate if current word is a zero
        add   $t1, $t1, $t0          # Add current word to result
        addi  $s1, $s1, 4            # Point to the next word in the array
        beq   $t1, $t1, again        # Loop again
done:
        nop                          # Do nothing

我在优化代码方面遇到了困难。我beq $t1, $t1, again感受到beq $t1, $t1, again （因为它总是真的）是不必要的，但我不知道如何删除它。这是我的尝试，但我现在意识到我的代码不会终止。

        add   $t1, $zero, $zero      # Initialize result to zero
again:  
        lw    $t0, 0($s1)            # Load the word from the array
        add   $t1, $t1, $t0          # Add current word to result
        addi  $s1, $s1, 4            # Point to the next word in the array
        bne   $t1, $zero, again      # If result is not zero, loop
done:
        nop                          # Do nothing

我从不检查终止零点并跳转到完成状态。但是如果我再添加一张支票，那么代码是否与以前一样？

通常，您想要将顶部循环的测试转换为底部循环的测试。要做到这一点，你经常需要跳到（或多或少）第一次迭代的循环体中间。在伪代码中，你现在所拥有的基本上是：

    initialize sum
beginning:
    load a word
    if (done) goto end
    add to sum
    increment pointer
    goto beginning
end:

为了优化，我们希望将结构更改为如下所示：

    initialize sum
    goto start_loop

beginning:
    add to sum
    increment pointer
start_loop:
    load a word
    if (!done) goto beginning

这样每个循环只有一个跳转而不是两个（这是一个很短的向后跳转，因此目标几乎总是在缓存中）。

也就是说，我应该补充一点，这种优化实际上已经过时 - 通过合适的分支预测，无条件跳转通常基本上是免费的。

编辑：由于循环展开已被提及，我会添加我的两分钱。分支预测通常会使循环展开过时，除非您可以使用它来并行执行更多指令。这不是一个问题，但在现实生活中通常很有用。例如，如果我们假设CPU有两个独立的加法器，我们可以展开循环的两个迭代，并将这些迭代的结果添加到两个单独的目标寄存器中，所以我们利用两个加法器的相同优势时间。然后，当循环终止时，我们将这两个寄存器相加以得到最终值。在类似C的伪代码中，会出现这样的情况：

odds = 0
evens = 0

do {   
    evens += pointer[0];
    odds += pointer[1];
    pointer += 2;
while (pointer[0] && pointer[1]);
total = odds + evens;

正如所写的，这增加了两个连续的零的小额外要求来终止序列而不是一个，并且要添加的数组中至少有两个项目。但是，请注意，这不是真正的循环展开，它提供了这里的主要优点，而是并行执行。

没有这一点，如果一个没有被采用的分支比一个被采用的分支便宜（即使两者都被正确预测），我们真的只能看到循环展开的好处。在较早的分支机构（例如，较早的英特尔公司），分支机构相对于未采用的分支机构处罚。与此同时，展开的循环将使用更多的缓存空间。因此，在现代处理器上，它通常是一个全面的损失（除非像我之前所说的那样，我们可以使用展开来获得并行性）。

展开你的循环：


        add   $t1, $zero, $zero      # Initialize result to zero
again:  
        lw    $t0, 0($s1)            # Load the word from the array
        beq   $t0, $zero, done       # Terminate if current word is a zero
        add   $t1, $t1, $t0          # Add current word to result
        addi  $s1, $s1, 4            # Point to the next word in the array

        lw    $t0, 0($s1)            # Load the word from the array
        beq   $t0, $zero, done       # Terminate if current word is a zero
        add   $t1, $t1, $t0          # Add current word to result
        addi  $s1, $s1, 4            # Point to the next word in the array

        lw    $t0, 0($s1)            # Load the word from the array
        beq   $t0, $zero, done       # Terminate if current word is a zero
        add   $t1, $t1, $t0          # Add current word to result
        addi  $s1, $s1, 4            # Point to the next word in the array

        lw    $t0, 0($s1)            # Load the word from the array
        beq   $t0, $zero, done       # Terminate if current word is a zero
        add   $t1, $t1, $t0          # Add current word to result
        addi  $s1, $s1, 4            # Point to the next word in the array

        # and so on to a reasonable amount, 4-8 times are common.

        b     again        # Loop again
done:
        nop

链接地址: http://www.djcxy.com/p/10355.html

上一篇: Optimize this assembly code

下一篇: Creating an empty placeholder fileset in Ant