Pandas DataFrame.assign arguments
QUESTION
How can assign
be used to return a copy of the original DataFrame with multiple new columns added?
DESIRED RESULT
df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
A B C D
0 1 11 1 22
1 2 12 4 24
2 3 13 9 26
3 4 14 16 28
ATTEMPTS
The example above results in:
ValueError: Wrong number of items passed 2, placement implies 1
.
BACKGROUND
The assign
function in Pandas takes a copy of the relevant dataframe joined to the newly assigned column, eg
df = df.assign(C=df.B * 2)
>>> df
A B C
0 1 11 22
1 2 12 24
2 3 13 26
3 4 14 28
The 0.19.2 documentation for this function implies that more than one column can be added to the dataframe.
Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.
In addition:
Parameters:
kwargs : keyword, value pairs
keywords are the column names.
The source code for the function states that it accepts a dictionary:
def assign(self, **kwargs):
"""
.. versionadded:: 0.16.0
Parameters
----------
kwargs : keyword, value pairs
keywords are the column names. If the values are callable, they are computed
on the DataFrame and assigned to the new columns. If the values are not callable,
(e.g. a Series, scalar, or array), they are simply assigned.
Notes
-----
Since ``kwargs`` is a dictionary, the order of your
arguments may not be preserved. The make things predicatable,
the columns are inserted in alphabetical order, at the end of
your DataFrame. Assigning multiple columns within the same
``assign`` is possible, but you cannot reference other columns
created within the same ``assign`` call.
"""
data = self.copy()
# do all calculations first...
results = {}
for k, v in kwargs.items():
if callable(v):
results[k] = v(data)
else:
results[k] = v
# ... and then assign
for k, v in sorted(results.items()):
data[k] = v
return data
You can create multiple column by supplying each new column as a keyword argument:
df = df.assign(C=df['A']**2, D=df.B*2)
I got your example dictionary to work by unpacking the dictionary as keyword arguments using **
:
df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
It seems like assign
should be able to take a dictionary, but it doesn't look to be currently supported based on the source code you posted.
The resulting output:
A B C D
0 1 11 1 22
1 2 12 4 24
2 3 13 9 26
3 4 14 16 28
链接地址: http://www.djcxy.com/p/70914.html