Why is x**3 slower than x*x*x?

This question already has an answer here:

  • Speed of calculating powers (in python) 6 answers

  • As per this answer, it's because the implementation of exponentiation has some overhead that multiplication does not. However, naive multiplication will get slower and slower as the exponent increases. An empirical demonstration:

     In [3]: x = np.random.rand(1e6)
    
     In [15]: %timeit x**2
     100 loops, best of 3: 11.9 ms per loop
    
     In [16]: %timeit x*x
     100 loops, best of 3: 12.7 ms per loop
    
     In [17]: %timeit x**3
     10 loops, best of 3: 132 ms per loop
    
     In [18]: %timeit x*x*x
     10 loops, best of 3: 27.2 ms per loop
    
     In [19]: %timeit x**4
     10 loops, best of 3: 132 ms per loop
    
     In [20]: %timeit x*x*x*x
     10 loops, best of 3: 42.4 ms per loop
    
     In [21]: %timeit x**10
     10 loops, best of 3: 132 ms per loop
    
     In [22]: %timeit x*x*x*x*x*x*x*x*x*x
     10 loops, best of 3: 137 ms per loop
    
     In [24]: %timeit x**15
     10 loops, best of 3: 132 ms per loop
    
     In [25]: %timeit x*x*x*x*x*x*x*x*x*x*x*x*x*x*x
     1 loops, best of 3: 212 ms per loop
    

    Note the exponentiation time stays more or less constant, except for the x**2 case which I suspect is special-cased, while multiplication gets slower and slower. It seems you could exploit this to get faster integer exponentiation... for example:

    In [26]: %timeit x**16
    10 loops, best of 3: 132 ms per loop
    
    In [27]: %timeit x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x
    1 loops, best of 3: 225 ms per loop
    
    In [28]: def tosixteenth(x):
       ....:     x2 = x*x
       ....:     x4 = x2*x2
       ....:     x8 = x4*x4
       ....:     x16 = x8*x8
       ....:     return x16
       ....:
    
    In [29]: %timeit tosixteenth(x)
    10 loops, best of 3: 49.5 ms per loop
    

    It seems you could apply this technique generically by splitting any integer into a sum of the powers of two, computing each power of two as above, and summing:

    In [93]: %paste
    def smartintexp(x, exp):
        result = np.ones(len(x))
        curexp = np.array(x)
        while True:
            if exp%2 == 1:
                result *= curexp
            exp >>= 1
            if not exp: break
            curexp *= curexp
        return result
    ## -- End pasted text --
    
    In [94]: x
    Out[94]:
    array([ 0.0163407 ,  0.57694587,  0.47336487, ...,  0.70255032,
            0.62043303,  0.0796748 ])
    
    In [99]: x**21
    Out[99]:
    array([  3.01080670e-38,   9.63466181e-06,   1.51048544e-07, ...,
             6.02873388e-04,   4.43193256e-05,   8.46721060e-24])
    
    In [100]: smartintexp(x, 21)
    Out[100]:
    array([  3.01080670e-38,   9.63466181e-06,   1.51048544e-07, ...,
             6.02873388e-04,   4.43193256e-05,   8.46721060e-24])
    
    In [101]: %timeit x**21
    10 loops, best of 3: 132 ms per loop
    
    In [102]: %timeit smartintexp(x, 21)
    10 loops, best of 3: 70.7 ms per loop
    

    It's fast for small even powers of two:

    In [106]: %timeit x**32
    10 loops, best of 3: 131 ms per loop
    
    In [107]: %timeit smartintexp(x, 32)
    10 loops, best of 3: 57.4 ms per loop
    

    But gets slower as the exponent gets larger:

    In [97]: %timeit x**63
    10 loops, best of 3: 133 ms per loop
    
    In [98]: %timeit smartintexp(x, 63)
    10 loops, best of 3: 110 ms per loop
    

    And not faster for large worst-cases:

    In [115]: %timeit x**511
    10 loops, best of 3: 135 ms per loop
    
    In [114]: %timeit smartintexp(x, 511)
    10 loops, best of 3: 192 ms per loop
    

    As a note if you are calculating powers and are worried about speed:

    x = np.random.rand(5e7)
    
    %timeit x*x*x
    1 loops, best of 3: 522 ms per loop
    
    %timeit np.einsum('i,i,i->i',x,x,x)
    1 loops, best of 3: 288 ms per loop
    

    Why einsum is faster is still an open question of mine. Although its like due to einsum able to use SSE2 while numpy's ufuncs will not until 1.8.

    In place is even faster:

    def calc_power(arr):
        for x in xrange(arr.shape[0]):
            arr[x]=arr[x]*arr[x]*arr[x]
    numba_power = autojit(calc_power)
    
    %timeit numba_power(x)
    10 loops, best of 3: 51.5 ms per loop
    
    %timeit np.einsum('i,i,i->i',x,x,x,out=x)
    10 loops, best of 3: 111 ms per loop
    
    %timeit np.power(x,3,out=x)
    1 loops, best of 3: 609 ms per loop
    

    I expect it is because x**y must handle the generic case where both x and y are floats. Mathematically we can write x**y = exp(y*log(x)) . Following your example I find

    x = np.random.rand(1e6)
    %timeit x**3
    10 loops, best of 3: 178 ms per loop
    
    %timeit np.exp(3*np.log(x))
    10 loops, best of 3: 176 ms per loop
    

    I have not checked the actual numpy code but it must be doing something like this internally.

    链接地址: http://www.djcxy.com/p/15022.html

    上一篇: 如何提高电力工程? 是否值得使用pow(x,2)?

    下一篇: 为什么x ** 3比x * x * x慢?