How to iterate over rows in a DataFrame in Pandas?
I have a DataFrames from pandas:
import pandas as pd
inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
df = pd.DataFrame(inp)
print df
Output:
c1 c2
0 10 100
1 11 110
2 12 120
Now I want to iterate over the rows of the above frame. For every row I want to be able to access its elements (values in cells) by the name of the columns. So, for example, I would like to have something like that:
for row in df.rows:
print row['c1'], row['c2']
Is it possible to do that in pandas?
I found similar question. But it does not give me the answer I need. For example, it is suggested there to use:
for date, row in df.T.iteritems():
or
for row in df.iterrows():
But I do not understand what the row
object is and how I can work with it.
iterrows是一个产生索引和行的发生器
In [18]: for index, row in df.iterrows():
....: print row['c1'], row['c2']
....:
10 100
11 110
12 120
虽然iterrows()
是一个很好的选择,但有时itertuples()
会更快:
df = pd.DataFrame({'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000)), 'x': 'x'})
%timeit [row.a * 2 for idx, row in df.iterrows()]
# => 10 loops, best of 3: 50.3 ms per loop
%timeit [row[1] * 2 for row in df.itertuples()]
# => 1000 loops, best of 3: 541 µs per loop
To iterate through DataFrame's row in pandas one can use:
DataFrame.iterrows()
for index, row in df.iterrows():
print row["c1"], row["c2"]
DataFrame.itertuples()
for row in df.itertuples(index=True, name='Pandas'):
print getattr(row, "c1"), getattr(row, "c2")
itertuples()
is supposed to be faster than iterrows()
But be aware, according to the docs (pandas 0.21.1 at the moment):
iterrows: dtype
might not match from row to row
Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames).
iterrows: Do not modify rows
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
Use DataFrame.apply() instead:
new_df = df.apply(lambda x: x * 2)
itertuples:
The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.