Problems with grouping factor, data frame, and tapply
I am very new to R and stats in general and am having trouble getting tapply() to work. I have a data frame with 15 columns and thousands of rows. I made a bunch of logical vectors using something like y1<-((x>0)&(x<=5))
and similar, where x is a column name in the data frame. These logical vectors were then combined and converted into a grouping factor using factor(). Everything looks to be working fine with this.
The problem is that when I try to use tapply() with tapply(dataframe, group, sample, size=20)
where group
is the grouping factor, I get the error: 'arguments must have same length'. When I try length(dataframe)
I get the number of columns in the data frame (only 15), whereas length(group)
returns the number of rows (thousands). Is there an error in the way I'm creating my logical vectors and grouping factor?
Here's the output from dput() as Maxim.K suggested: (sorry, it's not very tidy)
structure(list(Lat = c(-90L, -90L, -90L, -90L, -90L, -90L, -90L,
-90L, -90L, -90L, -90L, -90L, -90L, -90L, -90L), Lon = -180:-166,
Jan = c(2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79,
2.79, 2.79, 2.79, 2.79, 2.79, 2.79), Feb = c(2.35, 2.35,
2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35,
2.35, 2.35, 2.35), Mar = c(0.49, 0.49, 0.49, 0.49, 0.49,
0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49
), Apr = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
May = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Jun = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Jul = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Aug = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Sep = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Oct = c(1.75, 1.75, 1.75,
1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75,
1.75, 1.75), Nov = c(2.77, 2.77, 2.77, 2.77, 2.77, 2.77,
2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77), Dec = c(2.65,
2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65,
2.65, 2.65, 2.65, 2.65), Ann = c(1.07, 1.07, 1.07, 1.07,
1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07,
1.07)), .Names = c("Lat", "Lon", "Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Ann"
), row.names = c(NA, 15L), class = "data.frame")
And for group:
15 values from the head (from dput())
structure(c(8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")
... and from the tail
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")
I'm trying to take random samples from all 8 categories using tapply() (of size 20).
[edit] Totally unsurprisingly, the problem was not with the question and requirements but with my comprehension. I misread the question; in fact, I was only supposed to sample from one column, not from the entire data frame.
tapply
can be used here, you just have to add the group
vector to your data.frame
and then use tapply
as in:
# Generating a 'group' vector with variability in its values
# and merging it to the existing data.frame (FOO)
set.seed(1)
FOO$group <- as.factor(sample( 1:8, nrow(FOO), replace=TRUE))
# Using tapply
tapply(FOO[,-16], FOO[,16], sample, size=20, replace=TRUE)
This may be the answer to your homework.
链接地址: http://www.djcxy.com/p/38288.html上一篇: 创造NA?
下一篇: 分组因子,数据框和tapply存在问题