Why is pandas '==' different than '.eq()'

2018-06-13 05:49:05

Consider the series s

s = pd.Series([(1, 2), (3, 4), (5, 6)])

This is as expected

s == (3, 4)

0    False
1     True
2    False
dtype: bool

This is not

s.eq((3, 4))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

ValueError: Lengths must be equal

I was under the assumption they were the same. What is the difference between them?

What does the documentation say?

Equivalent to series == other, but with support to substitute a fill_value for missing data in one of the inputs.

This seems to imply that they should work the same, hence the confusion.

What you encounter is actually a special case that makes it easier to compare pandas.Series or numpy.ndarray with normal python constructs. The source code reads:

def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
    # validate axis
    if axis is not None:
        self._get_axis_number(axis)
    if isinstance(other, ABCSeries):
        return self._binop(other, op, level=level, fill_value=fill_value)
    elif isinstance(other, (np.ndarray, list, tuple)):
        if len(other) != len(self):
            # ---------------------------------------
            # you never reach the `==` path because you get into this.
            # ---------------------------------------
            raise ValueError('Lengths must be equal')  
        return self._binop(self._constructor(other, self.index), op,
                           level=level, fill_value=fill_value)
    else:
        if fill_value is not None:
            self = self.fillna(fill_value)

        return self._constructor(op(self, other),
                                 self.index).__finalize__(self)

You're hitting the ValueError because pandas assumes for .eq that you wanted the value converted to a numpy.ndarray or pandas.Series (if you give it an array, list or tuple) instead of actually comparing it to the tuple . For example if you have:

s = pd.Series([1,2,3])
s.eq([1,2,3])

you wouldn't want it to compare each element to [1,2,3] .

The problem is that object arrays (as with dtype=uint ) often slip through the cracks or are neglected on purpose. A simple if self.dtype != 'object' branch inside that method could resolve this issue. But maybe the developers had strong reasons to actually make this case different. I would advise to ask for clarification by posting on their bug tracker.

You haven't asked how you can make it work correctly but for completness I'll include one possibility (according to the source code it seems likely you need to wrap it as pandas.Series yourself):

>>> s.eq(pd.Series([(1, 2)]))
0     True
1    False
2    False
dtype: bool

== is an element-wise comparison that yields a vector of truth values, whereas .eq is a "those two iterables are equal", for which being of the same length is a requirement. Ayhan points to one exception: when you compare a pandas vector type using .eq(scalar value) , the scalar value is simply broadcast to a vector of the same size for comparison.

链接地址: http://www.djcxy.com/p/37782.html

上一篇: 内容提供者不与SqliteDatabase同步

下一篇: 为什么熊猫'=='不同于'.eq（）'