CPU instruction sets for linear algebra?

I'm in a situation where I have to perform some linear algebra calculations with a matrix that almost never changes and a lot of small vectors ( very very few 3x3 or 4x4 matrices and vectors with 3 values ) in C++, I was thinking about using some CPU instructions set for x86 32 bit, x86 64 bit, ARMv5 and above to speed up things and simplify the design of my math operations.

Surprisingly I haven't found a real set for linear algebra, most of them are for floating point math, cached, optimized as you want, but nothing really for matrices and linear algebra, is that just me or there is no set for linear algebra ?

The new FMA3 from AMD looks interesting to start with, but it's still really too rare to find in modern CPUs, I would like to stick to something as popular as the SSE on the x86 or the ARMv5 on ARM.

So there is a popular instruction set for small and quick linear algebra computations ? I could even accept a good amount of errors if the speed is good enough.

EDIT:

I should also note that in practice my compilers are:

  • gcc
  • mingw
  • Visual Studio
  • so I would like to have an open source product and a portable library on both x86 and ARM.

    EDIT 2: Eigen doesn't support multithreaded execution, it's a big down for me.


    可能你已经知道了这一点,但对于x86架构,我可以推荐你通过AVX或AVX2的英特尔BLAS。有关详细信息,请看这里:http://software.intel.com/en-us/articles/optimize-for-intel- avx-using-intel-math-kernel-librarys-basic-linear-algebra-subprograms-blas-with-dgemm-routine or here http://software.intel.com/en-us/articles/intel-math-kernel -library-Intel的MKL-BLAS-cblas和 - LAPACK-compilinglinking函数-FORTRAN和 - CC-呼叫


    You're not actually looking for a full linear algebra library, but just portable vector operations.

    Searching for "portable C++ SIMD" generates plenty of relevant hits. One of the most promising is

  • Vc: portable, zero-overhead SIMD library for C++
  • Vc is a free software library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    链接地址: http://www.djcxy.com/p/62988.html

    上一篇: 最终场库上的多项式

    下一篇: 线性代数的CPU指令集?