How do I get the row count of a Pandas dataframe?

2018-06-25 08:01:38

I'm trying to get the number of rows of dataframe df with Pandas, and here is my code.

Method 1:

total_rows = df.count
print total_rows +1

Method 2:

total_rows = df['First_columnn_label'].count
print total_rows +1

Both the code snippets give me this error:

TypeError: unsupported operand type(s) for +: 'instancemethod' and 'int'

What am I doing wrong?

According to the answer given by @root the best (the fastest) way to check df length is to call:

df.shape[0]

You can use the .shape property or just len(DataFrame.index) . However, there are notable performance differences ( the .shape property is faster):

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.arange(12).reshape(4,3))

In [4]: df
Out[4]: 
   0  1  2
0  0  1  2
1  3  4  5
2  6  7  8
3  9  10 11

In [5]: df.shape
Out[5]: (4, 3)

In [6]: timeit df.shape
1000000 loops, best of 3: 1.17 us per loop

In [7]: timeit df[0].count()
10000 loops, best of 3: 56 us per loop

In [8]: len(df.index)
Out[8]: 4

In [9]: timeit len(df.index)
1000000 loops, best of 3: 381 ns per loop

EDIT: As @Dan Allen noted in the comments len(df.index) and df[0].count() are not interchangeable as count excludes NaN s,

Use len(df) . This works as of pandas 0.11 or maybe even earlier.

__len__() is currently (0.12) documented with Returns length of index . Timing info, set up the same way as in root's answer:

In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop

Due to one additional function call it is a bit slower than calling len(df.index) directly, but this should not play any role in most use cases.

假设df是你的数据框，那么：

Count_Row=df.shape[0] #gives number of row count
Count_Col=df.shape[1] #gives number of col count

链接地址: http://www.djcxy.com/p/70928.html

上一篇: 我如何在CUDA下改进此功能？

下一篇: 我如何获得熊猫数据框的行数？