How can I convert a factor variable with missing values to a numeric variable?

2018-06-08 05:26:14

I loaded my dataset (original.csv) to R: original <- read.csv("original.csv")

str(original) showed that my dataset has 16 variables (14 factors, 2 integers). 14 variables have missing values. It was OK, but 3 variables that are originally numbers, are known as factors.

I searched web and get a command as: as.numeric(as.character(original$Tumor_Size)) (Tumor_Size is a variable that has been known as factor).

By the way, missing values in my dataset are marked as dot (.)

After running: as.numeric(as.character(original$Tumor_Size)), the values of Tumor_Size were listed and in the end a warning massage as: “NAs introduced by coercion” was appeared.

I expected after running above command, the variable converted to numeric, but second str(original) showed that my guess was wrong and Tumor_Size and another two variables were factors. In the below is sample of my dataset: a piece of my dataset

How can I solve my problem?

The crucial information here is how missing values are encoded in your data file. The corresponding argument in read.csv() is called na.strings . So if dots are used:

original <- read.csv("original.csv", na.strings = ".")

I'm not 100% sure what your problem is but maybe this will help....

original<-read.csv("original.csv",header = TRUE,stringsAsFactors = FALSE)
original$Tumor_Size<-as.numeric(original$Tumor_Size)

This will introduce NA's because it cannot convert your dot(.) to a numeric value. If you try to replace the NA's with a dot again it will return the field as a character, to do this you can use,

original$Tumor_Size[is.na(original$Tumor_Size)]<-"."

Hope this helps.

链接地址: http://www.djcxy.com/p/24952.html

上一篇: 将11个级别的因子转换为字符或数字

下一篇: 如何将具有缺失值的因子变量转换为数值变量？