Pandas DataFrame.assign参数

2018-06-25 07:54:08

题

如何使用assign来返回添加了多个新列的原始DataFrame的副本？

期望的结果

df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

ATTEMPTS

上面的例子导致：

ValueError: Wrong number of items passed 2, placement implies 1 。

背景

Pandas中的assign函数会获取与新分配列相关的相关数据框的副本，例如

df = df.assign(C=df.B * 2)
>>> df
   A   B   C
0  1  11  22
1  2  12  24
2  3  13  26
3  4  14  28

该函数的0.19.2文档意味着可以将多个列添加到数据框中。

在同一个分配中分配多个列是可能的，但不能引用在同一个分配调用中创建的其他列。

此外：

参数：
kwargs：关键字，值对

关键字是列名称。

该函数的源代码声明它接受一个字典：

def assign(self, **kwargs):
    """
    .. versionadded:: 0.16.0
    Parameters
    ----------
    kwargs : keyword, value pairs
        keywords are the column names. If the values are callable, they are computed 
        on the DataFrame and assigned to the new columns. If the values are not callable, 
        (e.g. a Series, scalar, or array), they are simply assigned.

    Notes
    -----
    Since ``kwargs`` is a dictionary, the order of your
    arguments may not be preserved. The make things predicatable,
    the columns are inserted in alphabetical order, at the end of
    your DataFrame. Assigning multiple columns within the same
    ``assign`` is possible, but you cannot reference other columns
    created within the same ``assign`` call.
    """

    data = self.copy()

    # do all calculations first...
    results = {}
    for k, v in kwargs.items():

        if callable(v):
            results[k] = v(data)
        else:
            results[k] = v

    # ... and then assign
    for k, v in sorted(results.items()):
        data[k] = v

    return data

您可以通过提供每个新列作为关键字参数来创建多个列：

df = df.assign(C=df['A']**2, D=df.B*2)

我通过使用**将字典解压为关键字参数，从而使您的示例字典能够工作。

df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})

似乎assign应该能够采取字典，但它看起来目前不支持基于您发布的源代码。

结果输出：

   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

链接地址: http://www.djcxy.com/p/70913.html

上一篇: Pandas DataFrame.assign arguments

下一篇: How to drop columns in a nested data frame in R?