Comparing runtime of standard list comprehension vs. NumPy

2018-06-30 17:19:25

I'm experimenting with NumPy to see how and where it is faster than using generic list comprehensions in Python. Here's a standard coding question I'm using for this experiment.

Find the sum of all the multiples of 3 or 5 below 1000000.

I have written three functions to compute this number.

def fA(M):
    sum = 0
    for x in range(M):
        if x % 3 == 0 or x % 5 == 0:
           sum += x
    return sum

def fB(M):
    multiples_3 = range(0, M, 3)
    multiples_5 = range(0, M, 5)
    multiples_15 = range(0, M, 15)
    return sum(multiples_3) + sum(multiples_5) - sum(multiples_15)

def fC(M):
    arr = np.arange(M)
    return np.sum(arr[np.logical_or(arr % 3 == 0, arr % 5 == 0)])

I first did a quick sanity check to see that the three functions produced the same answer.

I then used timeit to compare the runtimes for the three functions.

%timeit -n 100 fA(1000000)
100 loops, best of 3: 182 ms per loop

%timeit -n 100 fB(1000000)
100 loops, best of 3: 14.4 ms per loop

%timeit -n 100 fC(1000000)
100 loops, best of 3: 44 ms per loop

It's no surprise that fA is the slowest. But why is fB so much better than fC? Is there a better way to compute this answer using NumPy?

I don't think size is an issue here. In fact, if I change the 1e6 to 1e9, fC becomes even slower when compared to fB.

fB is so much faster than fC because fC is not the NumPy equivalent of fB . fC is the NumPy equivalent of fA . This is the NumPy equivalent of fB :

def fD(M):
    multiples_3 = np.arange(0, M, 3)
    multiples_5 = np.arange(0, M, 5)
    multiples_15 = np.arange(0, M, 15)
    return multiples_3.sum() + multiples_5.sum() - multiples_15.sum()

It runs way faster:

In [4]: timeit fB(1000000)
100 loops, best of 3: 9.96 ms per loop
In [5]: timeit fD(1000000)
1000 loops, best of 3: 637 µs per loop

In fB you are constructing the ranges with the exact multiples you want. Their sizes become smaller from 3 to 5 to 15 and thus each takes less time to construct than the one before, after they are constructed you only need to take the sum and do some arithmetic.

In fC you are constructing a 100000 element array, the size isn't really the issue as much as the two modulo comparisons you are doing which must look at every single element in the array. This takes the lion's share of the execution time (about 90 %) for fC .

You're only really using numpy there to generate an array. You'd see a much bigger difference if you were trying to perform operations on arrays as opposed to performing them on lists or tuples. With regards to that particular problem, take a look at the function fD in the code below, which just calculates how many multiples there should be in each range and then calculates their sum, rather than generating the array. Actually, if you run the below snippet, you'll see how the times change in function of M. Also, fC breaks down for M >= 100000. I couldn't tell you why.

import numpy as np
from time import time

def fA(M):
    sum = 0
    for x in range(M):
        if x % 3 == 0 or x % 5 == 0:
           sum += x
    return sum

def fB(M):
    multiples_3 = range(0, M, 3)
    multiples_5 = range(0, M, 5)
    multiples_15 = range(0, M, 15)
    return sum(multiples_3) + sum(multiples_5) - sum(multiples_15)

def fC(M):
    arr = np.arange(M)
    return np.sum(arr[np.logical_or(arr % 3 == 0, arr % 5 == 0)])

def fD(M):
    return sum_mult(M,3)+sum_mult(M,5)-sum_mult(M,15)

def sum_mult(M,n):
    instances=(M-1)/n
    check=len(range(n,M,n))
    return (n*instances*(instances+1))/2


for x in range(5,20):
    print "*"*20
    M=2**x
    print M
    answers=[]
    T=[]
    for f in (fA,fB,fC,fD):
        ts=time()
        answers.append(f(M))
        for i in range(20):
            f(M)
        T.append(time()-ts)
    if not all([x==answers[0] for x in answers]):
        print "Warning! Answers do not match!",answers
    print T

链接地址: http://www.djcxy.com/p/85700.html

上一篇: numpy的QR出现比scipy快

下一篇: 比较标准列表理解与NumPy的运行时间