Change data type of columns in Pandas

I want to convert a table, represented as a list of lists, into a Pandas DataFrame. As an extremely simplified example:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)

What is the best way to convert the columns to the appropriate types, in this case columns 2 and 3 into floats? Is there a way to specify the types while converting to DataFrame? Or is it better to create the DataFrame first and then loop through the columns to change the type for each column? Ideally I would like to do this in a dynamic way because there can be hundreds of columns and I don't want to specify exactly which columns are of which type. All I can guarantee is that each columns contains values of the same type.


You can use pd.to_numeric (introduced in version 0.17) to convert a column or a Series to a numeric type. The function can also be applied over multiple columns of a DataFrame using apply .

Importantly, the function also takes an errors key word argument that lets you force not-numeric values to be NaN , or simply ignore columns containing these values.

Example uses are shown below.

Individual column / Series

Here's an example using a Series of strings s which has the object dtype:

>>> s = pd.Series(['1', '2', '4.7', 'pandas', '10'])
>>> s
0         1
1         2
2       4.7
3    pandas
4        10
dtype: object

The function's default behaviour is to raise if it can't convert a value. In this case, it can't cope with the string 'pandas':

>>> pd.to_numeric(s) # or pd.to_numeric(s, errors='raise')
ValueError: Unable to parse string

Rather than fail, we might want 'pandas' to be considered a missing/bad value. We can coerce invalid values to NaN as follows:

>>> pd.to_numeric(s, errors='coerce')
0     1.0
1     2.0
2     4.7
3     NaN
4    10.0
dtype: float64

The third option is just to ignore the operation if an invalid value is encountered:

>>> pd.to_numeric(s, errors='ignore')
# the original Series is returned untouched

Multiple columns / entire DataFrames

We might want to apply this operation to multiple columns. Processing each column in turn is tedious, so we can use DataFrame.apply to have the function act on each column.

Borrowing the DataFrame from the question:

>>> a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
>>> df = pd.DataFrame(a, columns=['col1','col2','col3'])
>>> df
  col1 col2  col3
0    a  1.2   4.2
1    b   70  0.03
2    x    5     0

Then we can write:

df[['col2','col3']] = df[['col2','col3']].apply(pd.to_numeric)

and now 'col2' and 'col3' have dtype float64 as desired.

However, we might not know which of our columns can be converted reliably to a numeric type. In that case we can just write:

df.apply(pd.to_numeric, errors='ignore')

Then the function will be applied to the whole DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (eg they contain non-digit strings or dates) will be left alone.

There is also pd.to_datetime and pd.to_timedelta for conversion to dates and timestamps.

Soft conversions

Version 0.21.0 introduces the method infer_objects() for converting columns of a DataFrame that have an object datatype to a more specific type.

For example, let's create a DataFrame with two columns of object type, with one holding integers and the other holding strings of integers:

>>> df = pd.DataFrame({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object')
>>> df.dtypes
a    object
b    object
dtype: object

Then using infer_objects() , we can change the type of column 'a' to int64:

>>> df = df.infer_objects()
>>> df.dtypes
a     int64
b    object
dtype: object

Column 'b' has been left alone since its values were strings, not integers. If we wanted to try and force the conversion of both columns to an integer type, we could use df.astype(int) instead.


这个怎么样?

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df
Out[16]: 
  one  two three
0   a  1.2   4.2
1   b   70  0.03
2   x    5     0

df.dtypes
Out[17]: 
one      object
two      object
three    object

df[['two', 'three']] = df[['two', 'three']].astype(float)

df.dtypes
Out[19]: 
one       object
two      float64
three    float64

this below code will change datatype of column.

df[['col.name1', 'col.name2'...]] = df[['col.name1', 'col.name2'..]].astype('data_type')

in place of data type you can give your datatype .what do you want like str,float,int etc.

链接地址: http://www.djcxy.com/p/70926.html

上一篇: 我如何获得熊猫数据框的行数?

下一篇: 更改Pandas中列的数据类型