Reorder levels of a factor without changing order of values

I have data frame with some numerical variables and some categorical factor variables. The order of levels for those factors is not the way I want them to be.

numbers <- 1:4
letters <- factor(c("a", "b", "c", "d"))
df <- data.frame(numbers, letters)
df
#   numbers letters
# 1       1       a
# 2       2       b
# 3       3       c
# 4       4       d

If I change the order of the levels, the letters no longer are with their corresponding numbers (my data is total nonsense from this point on).

levels(df$letters) <- c("d", "c", "b", "a")
df
#   numbers letters
# 1       1       d
# 2       2       c
# 3       3       b
# 4       4       a

I simply want to change the level order, so when plotting, the bars are shown in the desired order - which may differ from default alphabetical order.


使用factorlevels参数:

df <- data.frame(f = 1:4, g = letters[1:4])
df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

levels(df$g)
# [1] "a" "b" "c" "d"

df$g <- factor(df$g, levels = letters[4:1])
# levels(df$g)
# [1] "d" "c" "b" "a"

df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

some more, just for the record

## reorder is a base function
df$letters <- reorder(df$letters, new.order=letters[4:1])

library(gdata)
df$letters <- reorder.factor(df$letters, letters[4:1])

You may also find useful Relevel and combine_factor.


so what you want, in R lexicon, is to change only the labels for a given factor variable (ie, leave the data as well as the factor levels, unchanged).

df$letters = factor(df$letters, labels=c("d", "c", "b", "a"))

given that you want to change only the datapoint-to-label mapping and not the data or the factor schema (how the datapoints are binned into individual bins or factor values, it might help to know how the mapping is originally set when you initially create the factor.

the rules are simple:

  • labels are mapped to levels by index value (ie, the value at levels[2] is given the label, label[2]);
  • factor levels can be set explicitly by passing them in via the the levels argument; or
  • if no value is supplied for the levels argument, the default value is used which is the result calling unique on the data vector passed in (for the data argument);
  • labels can be set explicitly via the labels argument; or
  • if no value is supplied for the labels argument, the default value is used which is just the levels vector
  • 链接地址: http://www.djcxy.com/p/24940.html

    上一篇: 将因子转换为整数

    下一篇: 在不改变数值顺序的情况下重新排列因子的水平