How can I convert a factor variable with missing values to a numeric variable?

  • I loaded my dataset (original.csv) to R: original <- read.csv("original.csv")
  • str(original) showed that my dataset has 16 variables (14 factors, 2 integers). 14 variables have missing values. It was OK, but 3 variables that are originally numbers, are known as factors.
  • I searched web and get a command as: as.numeric(as.character(original$Tumor_Size)) (Tumor_Size is a variable that has been known as factor).
  • By the way, missing values in my dataset are marked as dot (.)
  • After running: as.numeric(as.character(original$Tumor_Size)), the values of Tumor_Size were listed and in the end a warning massage as: “NAs introduced by coercion” was appeared.
  • I expected after running above command, the variable converted to numeric, but second str(original) showed that my guess was wrong and Tumor_Size and another two variables were factors. In the below is sample of my dataset: a piece of my dataset
  • How can I solve my problem?


    The crucial information here is how missing values are encoded in your data file. The corresponding argument in read.csv() is called na.strings . So if dots are used:

    original <- read.csv("original.csv", na.strings = ".")
    

    I'm not 100% sure what your problem is but maybe this will help....

    original<-read.csv("original.csv",header = TRUE,stringsAsFactors = FALSE)
    original$Tumor_Size<-as.numeric(original$Tumor_Size)
    

    This will introduce NA's because it cannot convert your dot(.) to a numeric value. If you try to replace the NA's with a dot again it will return the field as a character, to do this you can use,

    original$Tumor_Size[is.na(original$Tumor_Size)]<-"."
    

    Hope this helps.

    链接地址: http://www.djcxy.com/p/24952.html

    上一篇: 将11个级别的因子转换为字符或数字

    下一篇: 如何将具有缺失值的因子变量转换为数值变量?