SIMD (SSE) instruction for division in GCC

2018-06-30 16:50:47

I'd like to optimize the following snippet using SSE instructions if possible:

/*
 * the data structure
 */
typedef struct v3d v3d;
struct v3d {
    double x;
    double y;
    double z;
} tmp = { 1.0, 2.0, 3.0 };

/*
 * the part that should be "optimized"
 */
tmp.x /= 4.0;
tmp.y /= 4.0;
tmp.z /= 4.0;

Is this possible at all?

I've used SIMD extension under windows, but have not yet under linux. That being said you should be able to take advantage of the DIVPS SSE operation which will divide a 4 float vector by another 4 float vector. But you are using doubles, so you'll want the SSE2 version DIVPD . I almost forgot, make sure to build with -msse2 switch.

I found a page which details some SSE GCC builtins. It looks kind of old, but should be a good start.

http://ds9a.nl/gcc-simd/

Is tmp.x *= 0.25; enough?

Note that for SSE instructions (in case that you want to use them) it's important that:

1) all the memory access is 16 bytes alighed

2) the operations are performed in a loop

3) no int <-> float or float <-> double conversions are performed

4) avoid divisions if possible

The intrinsic you are looking for is _mm_div_pd . Here is a working example which should be enough to steer you in the right direction:

#include <stdio.h>

#include <emmintrin.h>

typedef struct
{
    double x;
    double y;
    double z;
} v3d;

typedef union __attribute__ ((aligned(16)))
{
    v3d a;
    __m128d v[2];
} u3d;

int main(void)
{
    const __m128d vd = _mm_set1_pd(4.0);
    u3d u = { { 1.0, 2.0, 3.0 } };

    printf("v (before) = { %g %g %g }n", u.a.x, u.a.y, u.a.z);

    u.v[0] = _mm_div_pd(u.v[0], vd);
    u.v[1] = _mm_div_pd(u.v[1], vd);

    printf("v (after) = { %g %g %g }n", u.a.x, u.a.y, u.a.z);

    return 0;
}

链接地址: http://www.djcxy.com/p/85646.html

上一篇: 浮点加法与浮点乘法的相对速度是多少？

下一篇: SIMD（SSE）指令在GCC中进行划分