how to understand axis = 0 or 1 in pandas (Python)?
From the documentation, "the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1)" And the code is
df1 = pd.DataFrame({"x":[1, 2, 3, 4, 5],
"y":[3, 4, 5, 6, 7]},
index=['a', 'b', 'c', 'd', 'e'])
df2 = pd.DataFrame({"y":[1, 3, 5, 7, 9],
"z":[9, 8, 7, 6, 5]},
index=['b', 'c', 'd', 'e', 'f'])
pd.concat([df1, df2], join='inner') # by default axis=0
since axis=0( which I interpret as column) I think concat only considers columns that are found in both dataframes. But the acutal output considers rows that are found in both dataframes.(the only common row element 'y') So how should I understand axis=0,1 correctly?
Data:
In [55]: df1
Out[55]:
x y
a 1 3
b 2 4
c 3 5
d 4 6
e 5 7
In [56]: df2
Out[56]:
y z
b 1 9
c 3 8
d 5 7
e 7 6
f 9 5
Concatenated horizontally (axis=1), using index elements found in both DFs (aligned by indexes for joining):
In [57]: pd.concat([df1, df2], join='inner', axis=1)
Out[57]:
x y y z
b 2 4 1 9
c 3 5 3 8
d 4 6 5 7
e 5 7 7 6
Concatenated vertically (DEFAULT: axis=0), using columns found in both DFs:
In [58]: pd.concat([df1, df2], join='inner')
Out[58]:
y
a 3
b 4
c 5
d 6
e 7
b 1
c 3
d 5
e 7
f 9
If you don't use the inner
join method - you will have it this way:
In [62]: pd.concat([df1, df2])
Out[62]:
x y z
a 1.0 3 NaN
b 2.0 4 NaN
c 3.0 5 NaN
d 4.0 6 NaN
e 5.0 7 NaN
b NaN 1 9.0
c NaN 3 8.0
d NaN 5 7.0
e NaN 7 6.0
f NaN 9 5.0
In [63]: pd.concat([df1, df2], axis=1)
Out[63]:
x y y z
a 1.0 3.0 NaN NaN
b 2.0 4.0 1.0 9.0
c 3.0 5.0 3.0 8.0
d 4.0 6.0 5.0 7.0
e 5.0 7.0 7.0 6.0
f NaN NaN 9.0 5.0
Interpret axis=0 to apply the algorithm down each column, or to the row labels (the index).. A more detailed schema here.
If you apply that general interpretation to your case, the algorithm here is concat
. Thus for axis=0, it means:
for each column, take all the rows down (across all the dataframes for concat
) , and do contact them when they are in common (because you selected join=inner
).
So the meaning would be to take all columns x
and concat them down the rows which would stack each chunk of rows one after another. However, here x
is not present everywhere, so it is not kept for the final result. The same applies for z
. For y
the result is kept as y
is in all dataframes. This is the result you have.
First, OP misunderstood the rows and columns in his/her dataframe.
But the acutal output considers rows that are found in both dataframes.(the only common row element 'y')
OP thought the label y
is for row. However, y
is a column name.
df1 = pd.DataFrame(
{"x":[1, 2, 3, 4, 5], # <-- looks like row x but actually col x
"y":[3, 4, 5, 6, 7]}, # <-- looks like row y but actually col y
index=['a', 'b', 'c', 'd', 'e'])
print(df1)
col x y
index or row
a 1 3 | a
b 2 4 v x
c 3 5 r i
d 4 6 o s
e 5 7 w 0
-> column
a x i s 1
It is very easy to be misled since in the dictionary, it looks like y
and x
are two rows.
If you generate df1
from a list of list, it should be more intuitive:
df1 = pd.DataFrame([[1,3],
[2,4],
[3,5],
[4,6],
[5,7]],
index=['a', 'b', 'c', 'd', 'e'], columns=["x", "y"])
So back to the problem, concat
is a shorthand for concatenate (means to link together in a series or chain on this way [source]) Performing concat
along axis 0 means to linking two objects along axis 0.
1
1 <-- series 1
1
^ ^ ^
| | | 1
c a a 1
o l x 1
n o i gives you 2
c n s 2
a g 0 2
t | |
| V V
v
2
2 <--- series 2
2
So... think you have the feeling now. What about sum
function in pandas? What does sum(axis=0)
means?
Suppose data looks like
1 2
1 2
1 2
Maybe...summing along axis 0, you may guess. Yes!!
^ ^ ^
| | |
s a a
u l x
m o i gives you two values 3 6 !
| n s
v g 0
| |
V V
What about dropna
? Suppose you have data
1 2 NaN
NaN 3 5
2 4 6
and you only want to keep
2
3
4
On the documentation, it says Return object with labels on given axis omitted where alternately any or all of the data are missing
Should you put dropna(axis=0)
or dropna(axis=1)
? Think about it and try it out with
df = pd.DataFrame([[1, 2, np.nan],
[np.nan, 3, 5],
[2, 4, 6]])
# df.dropna(axis=0) or df.dropna(axis=1) ?
Hint: think about the word along.
链接地址: http://www.djcxy.com/p/93004.html上一篇: 前向声明与#import的子类化