Why is pandas '==' different than '.eq()'
Consider the series s
s = pd.Series([(1, 2), (3, 4), (5, 6)])
This is as expected
s == (3, 4)
0 False
1 True
2 False
dtype: bool
This is not
s.eq((3, 4))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
ValueError: Lengths must be equal
I was under the assumption they were the same. What is the difference between them?
What does the documentation say?
Equivalent to series == other, but with support to substitute a fill_value for missing data in one of the inputs.
This seems to imply that they should work the same, hence the confusion.
What you encounter is actually a special case that makes it easier to compare pandas.Series
or numpy.ndarray
with normal python constructs. The source code reads:
def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
# validate axis
if axis is not None:
self._get_axis_number(axis)
if isinstance(other, ABCSeries):
return self._binop(other, op, level=level, fill_value=fill_value)
elif isinstance(other, (np.ndarray, list, tuple)):
if len(other) != len(self):
# ---------------------------------------
# you never reach the `==` path because you get into this.
# ---------------------------------------
raise ValueError('Lengths must be equal')
return self._binop(self._constructor(other, self.index), op,
level=level, fill_value=fill_value)
else:
if fill_value is not None:
self = self.fillna(fill_value)
return self._constructor(op(self, other),
self.index).__finalize__(self)
You're hitting the ValueError
because pandas assumes for .eq
that you wanted the value converted to a numpy.ndarray
or pandas.Series
(if you give it an array, list or tuple) instead of actually comparing it to the tuple
. For example if you have:
s = pd.Series([1,2,3])
s.eq([1,2,3])
you wouldn't want it to compare each element to [1,2,3]
.
The problem is that object
arrays (as with dtype=uint
) often slip through the cracks or are neglected on purpose. A simple if self.dtype != 'object'
branch inside that method could resolve this issue. But maybe the developers had strong reasons to actually make this case different. I would advise to ask for clarification by posting on their bug tracker.
You haven't asked how you can make it work correctly but for completness I'll include one possibility (according to the source code it seems likely you need to wrap it as pandas.Series
yourself):
>>> s.eq(pd.Series([(1, 2)]))
0 True
1 False
2 False
dtype: bool
==
is an element-wise comparison that yields a vector of truth values, whereas .eq
is a "those two iterables are equal", for which being of the same length is a requirement. Ayhan points to one exception: when you compare a pandas vector type using .eq(scalar value)
, the scalar value is simply broadcast to a vector of the same size for comparison.
下一篇: 为什么熊猫'=='不同于'.eq()'