Is it possible to compute pow(10,x) at compile time? I've got a processor without floating point support and slow integer division. I'm trying to perform as many calculations as possible at compile time. I can dramatically speed up one particular function if I pass both x and C/pow(10,x) as arguments (x and C are always constant integers, but they are different constants for each call
是否有可能在编译时计算pow(10,x)? 我有一个没有浮点支持和慢整数除法的处理器。 我试图在编译时尽可能多地执行计算。 如果我将x和C/pow(10,x)作为参数传递(x和C总是常量整数,但对于每次调用它们是不同的常量C/pow(10,x) ,我可以大大加快某个特定函数的速度。 我想知道是否可以通过引入一个自动执行1/pow(10,x)的宏而不是迫使程序员计算它来使这些函数调用更容易出错。 有没有预处理器技巧? 我可以强制编译器优
From time to time I read that Fortran is or can be faster then C for heavy calculations. Is that really true? I must admit that I hardly know Fortran, but the Fortran code I have seen so far did not show that the language has features that C doesn't have. If it is true, please tell me why. Please don't tell me what languages or libs are good for number crunching, I don't intend t
我不时读到Fortran是或可能比C更快,因为计算繁重。 这是真的吗? 我必须承认,我几乎不了解Fortran,但迄今为止我看到的Fortran代码并未显示该语言具有C没有的功能。 如果确实如此,请告诉我为什么。 请不要告诉我哪种语言或库适合数字处理,我不打算写一个应用程序或库来做到这一点,我只是好奇。 这些语言具有相似的功能集。 性能差异来自Fortran说不允许使用别名,除非使用EQUIVALENCE语句。 任何具有别名的代码都
With GCC 5.3 the following code compield with -O3 -fma float mul_add(float a, float b, float c) { return a*b + c; } produces the following assembly vfmadd132ss %xmm1, %xmm2, %xmm0 ret I noticed GCC doing this with -O3 already in GCC 4.8. Clang 3.7 with -O3 -mfma produces vmulss %xmm1, %xmm0, %xmm0 vaddss %xmm2, %xmm0, %xmm0 retq but Clang 3.7 with -Ofast -mfma produces the same co
使用GCC 5.3,以下代码与-O3 -fma float mul_add(float a, float b, float c) { return a*b + c; } 生产以下组件 vfmadd132ss %xmm1, %xmm2, %xmm0 ret 我注意到GCC已经在GCC 4.8中使用了-O3 。 -O3 -mfma 3.7与-O3 -mfma生产 vmulss %xmm1, %xmm0, %xmm0 vaddss %xmm2, %xmm0, %xmm0 retq 但带有-Ofast -mfma Clang 3.7与使用-O3 fast GCC产生相同的代码。 我很惊讶GCC用-O3做,因为从这个答案中可以看出
I'd like to optimize the following snippet using SSE instructions if possible: /* * the data structure */ typedef struct v3d v3d; struct v3d { double x; double y; double z; } tmp = { 1.0, 2.0, 3.0 }; /* * the part that should be "optimized" */ tmp.x /= 4.0; tmp.y /= 4.0; tmp.z /= 4.0; Is this possible at all? I've used SIMD extension under windows, but have not yet un
如果可能,我想使用SSE指令优化以下片段: /* * the data structure */ typedef struct v3d v3d; struct v3d { double x; double y; double z; } tmp = { 1.0, 2.0, 3.0 }; /* * the part that should be "optimized" */ tmp.x /= 4.0; tmp.y /= 4.0; tmp.z /= 4.0; 这可能吗? 我在Windows下使用了SIMD扩展,但还没有在linux下。 这就是说你应该能够利用DIVPS SSE操作,该操作会将4浮点向量除以另一个4浮
I'm doing some trigonometry calculations in C/C++ and am running into problems with rounding errors. For example, on my Linux system: #include <stdio.h> #include <math.h> int main(int argc, char *argv[]) { printf("%en", sin(M_PI)); return 0; } This program gives the following output: 1.224647e-16 when the correct answer is of course 0. How much rounding error can I
我正在C / C ++中进行一些三角函数计算,并且遇到了舍入错误的问题。 例如,在我的Linux系统上: #include <stdio.h> #include <math.h> int main(int argc, char *argv[]) { printf("%en", sin(M_PI)); return 0; } 该程序提供以下输出: 1.224647e-16 当正确的答案当然是0。 使用trig函数时,我可以期待多少舍入误差? 我怎样才能最好地处理这个错误? 我熟悉最后一位的单位比较浮点数的技巧,
Intervals of floating-point bounds can be used to over-approximate sets of reals, as long as the upper bound of any result interval is computed in round-upwards and the lower bound in round-downwards. One recommended trick is to actually compute the negation of the lower bound. This allows to keep the FPU in round-upwards at all times (for instance, “Handbook of Floating-Point Arithmetic”, 2.9
只要任何结果区间的上限是在上舍入时计算的,而下舍入在下舍入时,浮点边界的区间就可以用来过度逼近集合的实数。 一个推荐的技巧是实际计算下界的否定。 这可以使FPU始终保持在圆形上(例如,“浮点运算手册”,2.9.2)。 这适用于加法和乘法。 另一方面,平方根运算在加法和乘法方面不是对称的。 对我来说,为了计算sqrtRD,对于下限,尽管存在其复杂性,但下面的习惯用法在IEEE 754双精度和FLT_EVAL_METHOD定义为0的普
The (exponentially) scaled complementary error function, commonly designated by erfcx , is defined mathematically as erfcx(x) := ex2 erfc(x). It frequently occurs in diffusion problems in physics as well as chemistry. While some mathematical environments, such as MATLAB and GNU Octave, provide this function, it is absent from the C standard math library, which only provides erf() and erfc() .
(指数规模)互补误差函数通常由erfcx ,在数学上定义为erfcx(x):= ex2 erfc(x)。 它经常发生在物理学和化学的扩散问题中。 虽然一些数学环境(如MATLAB和GNU Octave)提供了此功能,但它仅在提供erf()和erfc()的C标准数学库中不存在。 虽然直接基于数学定义可以实现自己的erfcx() ,但它只能在有限的输入域上工作,因为在正半平面erfc()下溢中等幅度的参数,而exp()溢出时,正如在这个问题中所指出的那样。 为了与C
I am currently looking into ways of using the fast single-precision floating-point reciprocal capability of various modern processors to compute a starting approximation for a 64-bit unsigned integer division based on fixed-point Newton-Raphson iterations. It requires computation of 264 / divisor, as accurately as possible, where the initial approximation must be smaller than, or equal to, the m
我目前正在研究如何使用各种现代处理器的快速单精度浮点交互能力来计算基于定点牛顿 - 拉夫逊迭代的64位无符号整数除法的起始逼近。 根据以下定点迭代的要求,它需要尽可能精确地计算264 /除数,其中初始近似值必须小于或等于数学结果。 这意味着这种计算需要提供低估。 我目前有以下代码,基于广泛的测试,它运行良好: #include <stdint.h> // import uint64_t #include <math.h> // import nextafterf() uint
The complementary error function, erfc, is a special functions closely related to the standard normal distribution. It is frequently used in statistics and the natural sciences (eg diffusion problems) where the "tails" of this distribution need to be considered, and use of the error function, erf, is therefore not suitable. The complementary error function was made available in the I
互补误差函数erfc是与标准正态分布密切相关的特殊函数。 它经常用于统计和自然科学(例如扩散问题),因为这种分布的“尾巴”需要考虑,因此错误函数erf的使用是不合适的。 互补错误函数在ISO C99标准数学库中作为函数erfcf , erfc和erfcl ; 这些随后也被采用到ISO C ++中。 因此源代码很容易在该库的开源实现中找到,例如在glibc中。 然而,许多现有的实现本质上是标量的,而现代处理器硬件是面向SIMD的(明确地,如在x86
I am developing a C application which needs floating-point determinism. I would also like the floating-point operations to be fairly fast. This includes standard transcendental functions not specified by IEEE754 like sine and logarithm. The software floating-point implementations I have considered are relatively slow, compared to hardware floating point, so I am considering simply rounding awa
我正在开发一个需要浮点确定性的C应用程序。 我也希望浮点运算速度相当快。 这包括IEEE754没有规定的标准超越函数,如正弦和对数。 与硬件浮点相比,我考虑的软件浮点实现相对较慢,所以我正在考虑简单地将每个答案中的一个或两个最低有效位四舍五入。 精确度的损失对我的应用来说是一个妥协的妥协,但这足以确保跨平台的确定性结果吗? 所有浮点值都将是双精度值。 我意识到操作顺序是浮点结果变化的另一个潜在来源。