How do I get the row count of a Pandas dataframe?
I'm trying to get the number of rows of dataframe df with Pandas, and here is my code.
Method 1:
total_rows = df.count
print total_rows +1
Method 2:
total_rows = df['First_columnn_label'].count
print total_rows +1
Both the code snippets give me this error:
TypeError: unsupported operand type(s) for +: 'instancemethod' and 'int'
What am I doing wrong?
According to the answer given by @root the best (the fastest) way to check df length is to call:
df.shape[0]
You can use the .shape
property or just len(DataFrame.index)
. However, there are notable performance differences ( the .shape
property is faster):
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.arange(12).reshape(4,3))
In [4]: df
Out[4]:
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
In [5]: df.shape
Out[5]: (4, 3)
In [6]: timeit df.shape
1000000 loops, best of 3: 1.17 us per loop
In [7]: timeit df[0].count()
10000 loops, best of 3: 56 us per loop
In [8]: len(df.index)
Out[8]: 4
In [9]: timeit len(df.index)
1000000 loops, best of 3: 381 ns per loop
EDIT: As @Dan Allen noted in the comments len(df.index)
and df[0].count()
are not interchangeable as count
excludes NaN
s,
Use len(df)
. This works as of pandas 0.11 or maybe even earlier.
__len__()
is currently (0.12) documented with Returns length of index
. Timing info, set up the same way as in root's answer:
In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop
In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop
Due to one additional function call it is a bit slower than calling len(df.index)
directly, but this should not play any role in most use cases.
假设df是你的数据框,那么:
Count_Row=df.shape[0] #gives number of row count
Count_Col=df.shape[1] #gives number of col count
链接地址: http://www.djcxy.com/p/70928.html
上一篇: 我如何在CUDA下改进此功能?
下一篇: 我如何获得熊猫数据框的行数?