Finds all rows fitting to combinatorial condition

I'm looking for the best way to do this using pythonexcelsqlgoogle sheets - I need to find all rows which fits to k values from list of n values.

For example I have this table called Animals:

| Name     | mammal | move   |  dive |
+----------+--------+--------+-------+
| Giraffe  |  1     |    1   |   0   |
| Frog     |  0     |    1   |   1   |
| Dolphin  |  1     |    1   |   1   |
| Snail    |  0     |    1   |   0   | 
| Bacteria |  0     |    0   |   0   | 

And I want to write a function foo that do behave like that:

foo(tuple of Boolean values, minimum matches)

foo((1,1,1),3) -> Dolphin
foo((1,1,1),2) -> Giraffe, Dolphin, Frog
foo((1,1,1),1) -> Giraffe, Dolphin, Frog, Snail
foo((1,1,0),2) -> Giraffe, Dolphin
foo((0,1,1),2) -> Dolphin, Frog
foo((0,1,1),1) -> Giraffe, Dolphin, Frog, Snail
foo((1,1,1),0) -> Giraffe, Dolphin, Frog, Snail, Bacteria 

What's the best way you think about?


Here's a pure Python 3 solution.

data = [
    ('Giraffe',  1, 1, 0),
    ('Frog',     0, 1, 1),
    ('Dolphin',  1, 1, 1),
    ('Snail',    0, 1, 0),
    ('Bacteria', 0, 0, 0),
]

probes = [
    ((1, 1, 1), 3),
    ((1, 1, 1), 2),
    ((1, 1, 1), 1),
    ((1, 1, 0), 2),
    ((0, 1, 1), 2),
    ((0, 1, 1), 1),
    ((1, 1, 1), 0),
]

def foo(mask, minmatch):
    for name, *row in data:
        if sum(u & v for u, v in zip(mask, row)) >= minmatch:
            yield name

for mask, minmatch in probes:
    print(mask, minmatch, *foo(mask, minmatch))

output

(1, 1, 1) 3 Dolphin
(1, 1, 1) 2 Giraffe Frog Dolphin
(1, 1, 1) 1 Giraffe Frog Dolphin Snail
(1, 1, 0) 2 Giraffe Dolphin
(0, 1, 1) 2 Frog Dolphin
(0, 1, 1) 1 Giraffe Frog Dolphin Snail
(1, 1, 1) 0 Giraffe Frog Dolphin Snail Bacteria

Tested on Python 3.6.0. It uses some syntax that's not available in older versions, but it's easy to adapt it to use older syntax.


This variation runs on older versions of Python. Tested on Python 2.6.6.

from __future__ import print_function

data = [
    ('Giraffe',  1, 1, 0),
    ('Frog',     0, 1, 1),
    ('Dolphin',  1, 1, 1),
    ('Snail',    0, 1, 0),
    ('Bacteria', 0, 0, 0),
]

probes = [
    ((1, 1, 1), 3),
    ((1, 1, 1), 2),
    ((1, 1, 1), 1),
    ((1, 1, 0), 2),
    ((0, 1, 1), 2),
    ((0, 1, 1), 1),
    ((1, 1, 1), 0),
]

def foo(mask, minmatch):
    for row in data:
        if sum(u & v for u, v in zip(mask, row[1:])) >= minmatch:
            yield row[0]

for mask, minmatch in probes:
    matches = list(foo(mask, minmatch))
    print(mask, minmatch, matches)

output

(1, 1, 1) 3 ['Dolphin']
(1, 1, 1) 2 ['Giraffe', 'Frog', 'Dolphin']
(1, 1, 1) 1 ['Giraffe', 'Frog', 'Dolphin', 'Snail']
(1, 1, 0) 2 ['Giraffe', 'Dolphin']
(0, 1, 1) 2 ['Frog', 'Dolphin']
(0, 1, 1) 1 ['Giraffe', 'Frog', 'Dolphin', 'Snail']
(1, 1, 1) 0 ['Giraffe', 'Frog', 'Dolphin', 'Snail', 'Bacteria']

如果表是熊猫数据框:

def foo(df, val, n_match):
    results = []
    for r in  df.values:
        if sum(val & r[1:]) >= n_match:
            results.append(r[0])
    print("foo(%s), %d -> %s") % (val, n_match, ' '.join(results))

I will try to use python with pandas

assume "Name" column is pandas index:

def foo(df, bool_index, minimum_matches):
    picked_column_index = [ idx for (idx, i) in enumerate(bool_index) if i] # select where "1" is
    picked_df = df.iloc[:, picked_column_index] #select column by location
    matched_row_bool = picked_df.sum(axis=1) >= minimum_matches
    return picked_df[matched_row_bool].index.tolist()

df is a pandas dataframe read from the table (Animals) maybe:

df = pandas.read_csv('animials_csv_file_path')

or

df = pandas.read_excel('animials_xls_file_path')

It will return a list include matched name

链接地址: http://www.djcxy.com/p/53564.html

上一篇: 为什么* x,在Python 3中解压缩映射对象?

下一篇: 查找适合组合条件的所有行