How to preserve original values in a variable turned into a factor?
Here's some working code to illustrate my question:
# Categorical variable recorded as numeric (integer)
df1 <- data.frame(group = c(1, 2, 3, 9, 3, 2, 9, 1, 9, 3, 2))
I have a categorical variable ( group
) recorded as integer values. For plots and to include this variable in models, it would be useful to have it encoded as factor, mapping each number to a label describing the category. So I crete a factor:
# Make it a factor
df1$group_f <- factor(x = df1$group,
levels = c(1, 2, 3, 9),
labels = c("G1", "G2", "G3", "Unknown"))
df1
group group_f
1 1 G1
2 2 G2
3 3 G3
4 9 Unknown
5 3 G3
6 2 G2
7 9 Unknown
8 1 G1
9 9 Unknown
10 3 G3
11 2 G2
Now, the problem is that eventually I need the original values again (because I have to join tables based on this variable, and the other table has the original numbers for each category -1,2,3,9- and not the labels).
Converting to numeric does not work ("Unknown" category gets mapped to 4 instead of 9)
# And back to numeric
df1$group_num <- as.numeric(df1$group_f)
df1
group group_f group_num
1 1 G1 1
2 2 G2 2
3 3 G3 3
4 9 Unknown 4
5 3 G3 3
6 2 G2 2
7 9 Unknown 4
8 1 G1 1
9 9 Unknown 4
10 3 G3 3
11 2 G2 2
?factor
says:
as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).
But as.numeric
over the levels does not work either ('cause levels now are character with the labels, so cannot be coerced to numeric):
> as.numeric(levels(df1$group_f))
[1] NA NA NA NA
Warning message:
NAs introduced by coercion
Is there a way to create a factor variable, so that it preserves the original values? (1,2,3,9 in this example)???
Note: the idea is to have one single factor variable that has the labels describing the categories, and the original number underlying. Although in this example I keep the variable group
along the newly created factor variable, in my real use case I would/can not do that (it is a huge dataset).
上一篇: R:将因子的某些级别转换为数字
下一篇: 如何保存变量中的原始值变成一个因子?