Fastest way to check if a value exist in a list

I'm searching for the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is? I know that all values in the list are unique like my example.

The first method I try is (3.8sec in my real code):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

The second method I try is (2x faster:1.9sec on my real code):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

Proposed methods from Stackoverflow user (2.74sec on my real code):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

In my real code, first method takes 3.81sec and the second method takes 1.88sec. It's a good improvement but:

I'm a beginner with Python/scripting and I want to know if a fastest way exists to do the same things and save more process time?

More specific explication for my application:

In the API of blender a can access to a list of particles:

particles = [1,2,3,4...etc.]

From there, I can access to its location:

particles[x].location = [x,y,z]

And I test for each particle if a neighbour exists by searching in the location of each particle like:

if [x+1,y,z] in particles.location
    "find the identity of this neighbour particles in x:the index 
    of the particles array"
    particles.index([x+1,y,z])

7 in a

Clearest and fastest way to do it.

You can also consider using a set , but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)


As stated by others, in can be very slow for large lists. Here are some comparisons of the performances for in , set and bisect . Note the time (in second) is in log scale.

在这里输入图像描述

Code for testing:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time

def method_in(a,b,c):
    start_time = time.time()
    for i,x in enumerate(a):
        if x in b:
            c[i] = 1
    return(time.time()-start_time)   

def method_set_in(a,b,c):
    start_time = time.time()
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = 1
    return(time.time()-start_time)

def method_bisect(a,b,c):
    start_time = time.time()
    b.sort()
    for i,x in enumerate(a):
        index = bisect.bisect_left(b,x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return(time.time()-start_time)

def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    Nls = [x for x in range(1000,20000,1000)]
    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        time_method_in.append(math.log(method_in(a,b,c)))
        time_method_set_in.append(math.log(method_set_in(a,b,c)))
        time_method_bisect.append(math.log(method_bisect(a,b,c)))

    plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in')
    plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set')
    plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

a = [1,2,3,4,'a','b','c']
return 'a' in a

我相信这是知道所选值是否在数组中的最快方法。

链接地址: http://www.djcxy.com/p/70690.html

上一篇: 如何检查列表中的所有元素是否与条件匹配?

下一篇: 检查列表中是否存在值的最快方法