Run function exactly once for each row in a Pandas dataframe
If I have a function
def do_irreversible_thing(a, b):
print a, b
And a dataframe, say
df = pd.DataFrame([(0, 1), (2, 3), (4, 5)], columns=['a', 'b'])
What's the best way to run the function exactly once for each row in a pandas dataframe. As pointed out in other questions, something like df.apply pandas will call the function twice for the first row. Even using numpy
np.vectorize(do_irreversible_thing)(df.a, df.b)
causes the function to be called twice on the first row, as will df.T.apply()
or df.apply(..., axis=1).
Is there a faster or cleaner way to call the function with every row than this explicit loop?
for idx, a, b in df.itertuples():
do_irreversible_thing(a, b)
It's unclear what your function is doing but to apply
a function to each row you can do so by passing axis=1
to apply
your function row-wise and pass the column elements of interest:
In [155]:
def foo(a,b):
return a*b
df = pd.DataFrame([(0, 1), (2, 3), (4, 5)], columns=['a', 'b'])
df.apply(lambda x: foo(x['a'], x['b']), axis=1)
Out[155]:
0 0
1 6
2 20
dtype: int64
However, so long as your function does not depend on the df mutating on each row, then you can just use a vectorised method to operate on the entire column:
In [156]:
df['a'] * df['b']
Out[156]:
0 0
1 6
2 20
dtype: int64
The reason is that because the functions are vectorised then it will scale better whilst the apply
is just syntactic sugar for iterating on your df so it's a for
loop essentially
The way I do it (because I also don't like the idea of looping with df.itertuples) is:
df.apply(do_irreversible_thing, axis=1)
and then your function should be like:
def do_irreversible_thing(x):
print x.a, x.b
this way you should be able to run your function over each row.
OR
if you can't modify your function you could apply
it like this
df.apply(lambda x: do_irreversible_thing(x[0],x[1]), axis=1)
链接地址: http://www.djcxy.com/p/91272.html
上一篇: 内核模块编程